提交 · 7a11370e5e6c26566904bb7f08281093a3002ff2 · openeuler / Kernel

15 10月, 2014 3 次提交

由 Michael S. Tsirkin 提交于 10月 15, 2014

virtio spec requires drivers to set DRIVER_OK before using VQs.
This is set automatically after probe returns, virtio block violated this
rule by calling add_disk, which causes the VQ to be used directly within
probe.

To fix, call virtio_device_ready before using VQs.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Reviewed-by: NCornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

7a11370e

virtio-blk: drop config_mutex · 1f54b0c0

由 Michael S. Tsirkin 提交于 10月 15, 2014

config_mutex served two purposes: prevent multiple concurrent config
change handlers, and synchronize access to config_enable flag.

Since commit dbf2576e
    workqueue: make all workqueues non-reentrant
all workqueues are non-reentrant, and config_enable
is now gone.

Get rid of the unnecessary lock.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Reviewed-by: NCornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

1f54b0c0

virtio_blk: drop config_enable · cc74f719

由 Michael S. Tsirkin 提交于 10月 15, 2014

Now that virtio core ensures config changes don't
arrive during probing, drop config_enable flag
in virtio blk.
On removal, flush is now sufficient to guarantee that
no change work is queued.

This help simplify the driver, and will allow
setting DRIVER_OK earlier without losing config
change notifications.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Reviewed-by: NCornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

cc74f719

02 7月, 2014 1 次提交

block: virtio-blk: support multi virt queues per virtio-blk device · 6a27b656

由 Ming Lei 提交于 6月 26, 2014

Firstly this patch supports more than one virtual queues for virtio-blk
device.

Secondly this patch maps the virtual queue to blk-mq's hardware queue.

With this approach, both scalability and performance can be improved.
Signed-off-by: NMing Lei <ming.lei@canonical.com>
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

6a27b656

30 5月, 2014 1 次提交

block: virtio_blk: don't hold spin lock during world switch · e8edca6f

由 Ming Lei 提交于 5月 30, 2014

Firstly, it isn't necessary to hold lock of vblk->vq_lock
when notifying hypervisor about queued I/O.

Secondly, virtqueue_notify() will cause world switch and
it may take long time on some hypervisors(such as, qemu-arm),
so it isn't good to hold the lock and block other vCPUs.

On arm64 quad core VM(qemu-kvm), the patch can increase I/O
performance a lot with VIRTIO_RING_F_EVENT_IDX enabled:
	- without the patch: 14K IOPS
	- with the patch: 34K IOPS

fio script:
	[global]
	direct=1
	bsrange=4k-4k
	timeout=10
	numjobs=4
	ioengine=libaio
	iodepth=64

	filename=/dev/vdc
	group_reporting=1

	[f1]
	rw=randread

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: virtualization@lists.linux-foundation.org
Signed-off-by: NMing Lei <ming.lei@canonical.com>
Acked-by: NRusty Russell <rusty@rustcorp.com.au>
Cc: stable@kernel.org # 3.13+
Signed-off-by: NJens Axboe <axboe@fb.com>

e8edca6f

29 5月, 2014 1 次提交

blk-mq: remove alloc_hctx and free_hctx methods · cdef54dd

由 Christoph Hellwig 提交于 5月 28, 2014

There is no need for drivers to control hardware context allocation
now that we do the context to node mapping in common code.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

cdef54dd

27 5月, 2014 1 次提交

virtio_blk: fix race between start and stop queue · aa0818c6

由 Ming Lei 提交于 5月 16, 2014

When there isn't enough vring descriptor for adding to vq,
blk-mq will be put as stopped state until some of pending
descriptors are completed & freed.

Unfortunately, the vq's interrupt may come just before
blk-mq's BLK_MQ_S_STOPPED flag is set, so the blk-mq will
still be kept as stopped even though lots of descriptors
are completed and freed in the interrupt handler. The worst
case is that all pending descriptors are freed in the
interrupt handler, and the queue is kept as stopped forever.

This patch fixes the problem by starting/stopping blk-mq
with holding vq_lock.

Cc: Jens Axboe <axboe@kernel.dk>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: NMing Lei <tom.leiming@gmail.com>
Cc: stable@kernel.org
Signed-off-by: NJens Axboe <axboe@fb.com>

Conflicts:
	drivers/block/virtio_blk.c

aa0818c6

16 5月, 2014 1 次提交

virtio_blk: fix race between start and stop queue · 0c29e93e

由 Ming Lei 提交于 5月 16, 2014

When there isn't enough vring descriptor for adding to vq,
blk-mq will be put as stopped state until some of pending
descriptors are completed & freed.

Unfortunately, the vq's interrupt may come just before
blk-mq's BLK_MQ_S_STOPPED flag is set, so the blk-mq will
still be kept as stopped even though lots of descriptors
are completed and freed in the interrupt handler. The worst
case is that all pending descriptors are freed in the
interrupt handler, and the queue is kept as stopped forever.

This patch fixes the problem by starting/stopping blk-mq
with holding vq_lock.

Cc: Jens Axboe <axboe@kernel.dk>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: NMing Lei <tom.leiming@gmail.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

0c29e93e

17 4月, 2014 1 次提交
- C
  blk-mq: add async parameter to blk_mq_start_stopped_hw_queues · 1b4a3258
  由 Christoph Hellwig 提交于 4月 16, 2014
```
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>
```
  1b4a3258
16 4月, 2014 3 次提交

blk-mq: split out tag initialization, support shared tags · 24d2f903

由 Christoph Hellwig 提交于 4月 15, 2014

Add a new blk_mq_tag_set structure that gets set up before we initialize
the queue.  A single blk_mq_tag_set structure can be shared by multiple
queues.
Signed-off-by: NChristoph Hellwig <hch@lst.de>

Modular export of blk_mq_{alloc,free}_tagset added by me.
Signed-off-by: NJens Axboe <axboe@fb.com>

24d2f903

blk-mq: add ->init_request and ->exit_request methods · e9b267d9

由 Christoph Hellwig 提交于 4月 15, 2014

The current blk_mq_init_commands/blk_mq_free_commands interface has a
two problems:

 1) Because only the constructor is passed to blk_mq_init_commands there
    is no easy way to clean up when a comman initialization failed.  The
    current code simply leaks the allocations done in the constructor.

 2) There is no good place to call blk_mq_free_commands: before
    blk_cleanup_queue there is no guarantee that all outstanding
    commands have completed, so we can't free them yet.  After
    blk_cleanup_queue the queue has usually been freed.  This can be
    worked around by grabbing an unconditional reference before calling
    blk_cleanup_queue and dropping it after blk_mq_free_commands is
    done, although that's not exatly pretty and driver writers are
    guaranteed to get it wrong sooner or later.

Both issues are easily fixed by making the request constructor and
destructor normal blk_mq_ops methods.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

e9b267d9

blk-mq: do not initialize req->special · 9d74e257

由 Christoph Hellwig 提交于 4月 14, 2014

Drivers can reach their private data easily using the blk_mq_rq_to_pdu
helper and don't need req->special.  By not initializing it code can
be simplified nicely, and we also shave off a few more instructions from
the I/O path.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

9d74e257

24 3月, 2014 1 次提交

virtio-blk: base queue-depth on virtqueue ringsize or module param · fc4324b4

由 Rusty Russell 提交于 3月 19, 2014

Venkatash spake thus:

  virtio-blk set the default queue depth to 64 requests, which was
  insufficient for high-IOPS devices. Instead set the blk-queue depth to
  the device's virtqueue depth divided by two (each I/O requires at least
  two VQ entries).

But behold, Ted added a module parameter:

  Also allow the queue depth to be something which can be set at module
  load time or via a kernel boot-time parameter, for
  testing/benchmarking purposes.

And I rewrote it substantially, mainly to take
VIRTIO_RING_F_INDIRECT_DESC into account.

As QEMU sets the vq size for PCI to 128, Venkatash's patch wouldn't
have made a change.  This version does (since QEMU also offers
VIRTIO_RING_F_INDIRECT_DESC.
Inspired-by: N"Theodore Ts'o" <tytso@mit.edu>
Based-on-the-true-story-of: Venkatesh Srinivas <venkateshs@google.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: virtio-dev@lists.oasis-open.org
Cc: virtualization@lists.linux-foundation.org
Cc: Frank Swiderski <fes@google.com>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

fc4324b4

15 3月, 2014 1 次提交

blk-mq: allow blk_mq_init_commands() to return failure · 95363efd

由 Jens Axboe 提交于 3月 14, 2014

If drivers do dynamic allocation in the hardware command init
path, then we need to be able to handle and return failures.

And if they do allocations or mappings in the init command path,
then we need a cleanup function to free up that space at exit
time. So add blk_mq_free_commands() as the cleanup function.

This is required for the mtip32xx driver conversion to blk-mq.
Signed-off-by: NJens Axboe <axboe@fb.com>

95363efd

13 3月, 2014 1 次提交

virtio_blk: don't crash, report error if virtqueue is broken. · 5261b85e

由 Rusty Russell 提交于 3月 13, 2014

A bad implementation of virtio might cause us to mark the virtqueue
broken: we'll dev_err() in that case, and the device is useless, but
let's not BUG_ON().

ENOMEM or ENOSPC implies the ring is full, and we should try again
later (-ENOMEM is documented to happen, but doesn't, as we fall
through to ENOSPC).

EIO means it's broken.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

5261b85e

11 2月, 2014 1 次提交

virtio_blk: use blk_mq_complete_request · 5124c285

由 Christoph Hellwig 提交于 2月 10, 2014

Make sure to complete requests on the submitting CPU.  Previously this
was done in blk_mq_end_io, but the responsibility shifted to the drivers.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

5124c285

20 11月, 2013 1 次提交

virtio-blk: virtqueue_kick() must be ordered with other virtqueue operations · f02b9ac3

由 Shaohua Li 提交于 11月 19, 2013

It isn't safe to call it without holding the vblk->vq_lock.
Reported-by: NDave Chinner <david@fromorbit.com>
Signed-off-by: NShaohua Li <shli@fusionio.com>

Fixed another condition of virtqueue_kick() not holding the lock.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f02b9ac3

14 11月, 2013 1 次提交

virtio_blk: blk-mq support · 1cf7e9c6

由 Jens Axboe 提交于 11月 01, 2013

Switch virtio-blk from the dual support for old-style requests and bios
to use the block-multiqueue.
Acked-by: NAsias He <asias@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

1cf7e9c6

29 10月, 2013 1 次提交

virtio_blk: verify if queue is broken after virtqueue_get_buf() · 7f03b17d

由 Heinz Graalfs 提交于 10月 29, 2013

In case virtqueue_get_buf() returned with a NULL pointer verify if the
virtqueue is broken in order to leave while loop.
Signed-off-by: NHeinz Graalfs <graalfs@linux.vnet.ibm.com>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

7f03b17d

17 10月, 2013 1 次提交

virtio: use size-based config accessors. · 855e0c52

由 Rusty Russell 提交于 10月 14, 2013

This lets the transport do endian conversion if necessary, and insulates
the drivers from the difference.

Most drivers can use the simple helpers virtio_cread() and virtio_cwrite().
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

855e0c52

23 9月, 2013 1 次提交

virtio: pm: use CONFIG_PM_SLEEP instead of CONFIG_PM · 89107000

由 Aaron Lu 提交于 9月 17, 2013

The freeze and restore functions defined in virtio drivers are used
for suspend and hibernate, so CONFIG_PM_SLEEP is more appropriate than
CONFIG_PM. This patch replace all CONFIG_PM with CONFIG_PM_SLEEP for
virtio drivers that implement freeze and restore callbacks.
Signed-off-by: NAaron Lu <aaron.lu@intel.com>
Reviewed-by: NAmit Shah <amit.shah@redhat.com>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

89107000

20 5月, 2013 1 次提交

virtio_blk: Add missing 'static' qualifiers · 2a647bfe

由 Jonghwan Choi 提交于 5月 20, 2013

Add missing 'static' qualifiers
Signed-off-by: NJonghwan Choi <jhbird.choi@samsung.com>
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

2a647bfe

20 3月, 2013 4 次提交

virtio_blk: remove nents member. · 0a11cc36

由 Rusty Russell 提交于 3月 20, 2013

It's simply a flag as to whether we have data now, so make it an
explicit function parameter rather than a member of struct
virtblk_req.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Reviewed-by: NAsias He <asias@redhat.com>

0a11cc36

virtio-blk: use virtqueue_add_sgs on req path · 20af3cfd

由 Paolo Bonzini 提交于 3月 20, 2013

(This is a respin of Paolo Bonzini's patch, but it calls
virtqueue_add_sgs() instead of his multi-part API).

This is similar to the previous patch, but a bit more radical
because the bio and req paths now share the buffer construction
code.  Because the req path doesn't use vbr->sg, however, we
need to add a couple of arguments to __virtblk_add_req.

We also need to teach __virtblk_add_req how to build SCSI command
requests.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Reviewed-by: NAsias He <asias@redhat.com>

20af3cfd

virtio-blk: use virtqueue_add_sgs on bio path · 8f39db9d

由 Paolo Bonzini 提交于 3月 20, 2013

(This is a respin of Paolo Bonzini's patch, but it calls
virtqueue_add_sgs() instead of his multi-part API).

Move the creation of the request header and response footer to
__virtblk_add_req.  vbr->sg only contains the data scatterlist,
the header/footer are added separately using virtqueue_add_sgs().

With this change, virtio-blk (with use_bio) is not relying anymore on
the virtio functions ignoring the end markers in a scatterlist.
The next patch will do the same for the other path.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Reviewed-by: NAsias He <asias@redhat.com>

8f39db9d

virtio-blk: reorganize virtblk_add_req · 5ee21a52

由 Paolo Bonzini 提交于 3月 20, 2013

Right now, both virtblk_add_req and virtblk_add_req_wait call
virtqueue_add_buf.  To prepare for the next patches, abstract the call
to virtqueue_add_buf into a new function __virtblk_add_req, and include
the waiting logic directly in virtblk_add_req.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Reviewed-by: NAsias He <asias@redhat.com>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

5ee21a52

12 3月, 2013 1 次提交

virtio-blk: emit udev event when device is resized · 9d9598b8

由 Milos Vyletel 提交于 3月 12, 2013

When virtio-blk device is resized from host (using block_resize from QEMU) emit
KOBJ_CHANGE uevent to notify guest about such change. This allows user to have
custom udev rules which would take whatever action if such event occurs. As a
proof of concept I've created simple udev rule that automatically resize
filesystem on virtio-blk device.

ACTION=="change", KERNEL=="vd*", \
        ENV{RESIZE}=="1", \
        ENV{ID_FS_TYPE}=="ext[3-4]", \
        RUN+="/sbin/resize2fs /dev/%k"
ACTION=="change", KERNEL=="vd*", \
        ENV{RESIZE}=="1", \
        ENV{ID_FS_TYPE}=="LVM2_member", \
        RUN+="/sbin/pvresize /dev/%k"
Signed-off-by: NMilos Vyletel <milos.vyletel@sde.cz>
Tested-by: NAsias He <asias@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (minor simplification)

9d9598b8

04 1月, 2013 1 次提交

Drivers: block: remove __dev* attributes. · 8d85fce7

由 Greg Kroah-Hartman 提交于 12月 21, 2012

CONFIG_HOTPLUG is going away as an option.  As a result, the __dev*
markings need to be removed.

This change removes the use of __devinit, __devexit_p, __devinitdata,
__devinitconst, and __devexit from these drivers.

Based on patches originally written by Bill Pemberton, but redone by me
in order to handle some of the coding style issues better, by hand.

Cc: Bill Pemberton <wfp5p@virginia.edu>
Cc: Mike Miller <mike.miller@hp.com>
Cc: Chirag Kantharia <chirag.kantharia@hp.com>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Jim Paris <jim@jtan.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Grant Likely <grant.likely@secretlab.ca>
Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: NeilBrown <neilb@suse.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Tao Guo <Tao.Guo@emc.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

8d85fce7

02 1月, 2013 1 次提交

virtio-blk: Don't free ida when disk is in use · f4953fe6

由 Alexander Graf 提交于 1月 02, 2013

When a file system is mounted on a virtio-blk disk, we then remove it
and then reattach it, the reattached disk gets the same disk name and
ids as the hot removed one.

This leads to very nasty effects - mostly rendering the newly attached
device completely unusable.

Trying what happens when I do the same thing with a USB device, I saw
that the sd node simply doesn't get free'd when a device gets forcefully
removed.

Imitate the same behavior for vd devices. This way broken vd devices
simply are never free'd and newly attached ones keep working just fine.
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Cc: stable@kernel.org

f4953fe6

28 9月, 2012 4 次提交

virtio-blk: Disable callback in virtblk_done() · bb811108

由 Asias He 提交于 9月 25, 2012

This reduces unnecessary interrupts that host could send to guest while
guest is in the progress of irq handling.

If one vcpu is handling the irq, while another interrupt comes, in
handle_edge_irq(), the guest will mask the interrupt via mask_msi_irq()
which is a very heavy operation that goes all the way down to host.

 Here are some performance numbers on qemu:

 Before:
 -------------------------------------
   seq-read  : io=0 B, bw=269730KB/s, iops=67432 , runt= 62200msec
   seq-write : io=0 B, bw=339716KB/s, iops=84929 , runt= 49386msec
   rand-read : io=0 B, bw=270435KB/s, iops=67608 , runt= 62038msec
   rand-write: io=0 B, bw=354436KB/s, iops=88608 , runt= 47335msec
     clat (usec): min=101 , max=138052 , avg=14822.09, stdev=11771.01
     clat (usec): min=96 , max=81543 , avg=11798.94, stdev=7735.60
     clat (usec): min=128 , max=140043 , avg=14835.85, stdev=11765.33
     clat (usec): min=109 , max=147207 , avg=11337.09, stdev=5990.35
   cpu          : usr=15.93%, sys=60.37%, ctx=7764972, majf=0, minf=54
   cpu          : usr=32.73%, sys=120.49%, ctx=7372945, majf=0, minf=1
   cpu          : usr=18.84%, sys=58.18%, ctx=7775420, majf=0, minf=1
   cpu          : usr=24.20%, sys=59.85%, ctx=8307886, majf=0, minf=0
   vdb: ios=8389107/8368136, merge=0/0, ticks=19457874/14616506,
 in_queue=34206098, util=99.68%
  43: interrupt in total: 887320
 fio --exec_prerun="echo 3 > /proc/sys/vm/drop_caches" --group_reporting
 --ioscheduler=noop --thread --bs=4k --size=512MB --direct=1 --numjobs=16
 --ioengine=libaio --iodepth=64 --loops=3 --ramp_time=0
 --filename=/dev/vdb --name=seq-read --stonewall --rw=read
 --name=seq-write --stonewall --rw=write --name=rnd-read --stonewall
 --rw=randread --name=rnd-write --stonewall --rw=randwrite

 After:
 -------------------------------------
   seq-read  : io=0 B, bw=309503KB/s, iops=77375 , runt= 54207msec
   seq-write : io=0 B, bw=448205KB/s, iops=112051 , runt= 37432msec
   rand-read : io=0 B, bw=311254KB/s, iops=77813 , runt= 53902msec
   rand-write: io=0 B, bw=377152KB/s, iops=94287 , runt= 44484msec
     clat (usec): min=81 , max=90588 , avg=12946.06, stdev=9085.94
     clat (usec): min=57 , max=72264 , avg=8967.97, stdev=5951.04
     clat (usec): min=29 , max=101046 , avg=12889.95, stdev=9067.91
     clat (usec): min=52 , max=106152 , avg=10660.56, stdev=4778.19
   cpu          : usr=15.05%, sys=57.92%, ctx=77109411, majf=0, minf=54
   cpu          : usr=26.78%, sys=101.40%, ctx=7387891, majf=0, minf=2
   cpu          : usr=19.03%, sys=58.17%, ctx=7681976, majf=0, minf=8
   cpu          : usr=24.65%, sys=58.34%, ctx=8442632, majf=0, minf=4
   vdb: ios=8389086/8361888, merge=0/0, ticks=17243780/12742010,
 in_queue=30078377, util=99.59%
  43: interrupt in total: 1259639
 fio --exec_prerun="echo 3 > /proc/sys/vm/drop_caches" --group_reporting
 --ioscheduler=noop --thread --bs=4k --size=512MB --direct=1 --numjobs=16
 --ioengine=libaio --iodepth=64 --loops=3 --ramp_time=0
 --filename=/dev/vdb --name=seq-read --stonewall --rw=read
 --name=seq-write --stonewall --rw=write --name=rnd-read --stonewall
 --rw=randread --name=rnd-write --stonewall --rw=randwrite
Signed-off-by: NAsias He <asias@redhat.com>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

bb811108

virtio-blk: fix NULL checking in virtblk_alloc_req() · f22cf8eb

由 Dan Carpenter 提交于 9月 05, 2012

Smatch complains about the inconsistent NULL checking here.  Fix it to
return NULL on failure.
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (fixed accidental deletion)

f22cf8eb

virtio-blk: Add REQ_FLUSH and REQ_FUA support to bio path · c85a1f91

由 Asias He 提交于 8月 08, 2012

We need to support both REQ_FLUSH and REQ_FUA for bio based path since
it does not get the sequencing of REQ_FUA into REQ_FLUSH that request
based drivers can request.

REQ_FLUSH is emulated by:
A) If the bio has no data to write:
1. Send VIRTIO_BLK_T_FLUSH to device,
2. In the flush I/O completion handler, finish the bio

B) If the bio has data to write:
1. Send VIRTIO_BLK_T_FLUSH to device
2. In the flush I/O completion handler, send the actual write data to device
3. In the write I/O completion handler, finish the bio

REQ_FUA is emulated by:
1. Send the actual write data to device
2. In the write I/O completion handler, send VIRTIO_BLK_T_FLUSH to device
3. In the flush I/O completion handler, finish the bio

Changes in v7:
- Using vbr->flags to trace request type
- Dropped unnecessary struct virtio_blk *vblk parameter
- Reuse struct virtblk_req in bio done function

Cahnges in v6:
- Reworked REQ_FLUSH and REQ_FUA emulatation order

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Tejun Heo <tj@kernel.org>
Cc: Shaohua Li <shli@kernel.org>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: virtualization@lists.linux-foundation.org
Signed-off-by: NAsias He <asias@redhat.com>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

c85a1f91

virtio-blk: Add bio-based IO path for virtio-blk · a98755c5

由 Asias He 提交于 8月 08, 2012

This patch introduces bio-based IO path for virtio-blk.

Compared to request-based IO path, bio-based IO path uses driver
provided ->make_request_fn() method to bypasses the IO scheduler. It
handles the bio to device directly without allocating a request in block
layer. This reduces the IO path in guest kernel to achieve high IOPS
and lower latency. The downside is that guest can not use the IO
scheduler to merge and sort requests. However, this is not a big problem
if the backend disk in host side uses faster disk device.

When the bio-based IO path is not enabled, virtio-blk still uses the
original request-based IO path, no performance difference is observed.

Using a slow device e.g. normal SATA disk, the bio-based IO path for
sequential read and write are slower than req-based IO path due to lack
of merge in guest kernel. So we make the bio-based path optional.

Performance evaluation:
-----------------------------
1) Fio test is performed in a 8 vcpu guest with ramdisk based guest using
kvm tool.

Short version:
 With bio-based IO path, sequential read/write, random read/write
 IOPS boost         : 28%, 24%, 21%, 16%
 Latency improvement: 32%, 17%, 21%, 16%

Long version:
 With bio-based IO path:
  seq-read  : io=2048.0MB, bw=116996KB/s, iops=233991 , runt= 17925msec
  seq-write : io=2048.0MB, bw=100829KB/s, iops=201658 , runt= 20799msec
  rand-read : io=3095.7MB, bw=112134KB/s, iops=224268 , runt= 28269msec
  rand-write: io=3095.7MB, bw=96198KB/s,  iops=192396 , runt= 32952msec
    clat (usec): min=0 , max=2631.6K, avg=58716.99, stdev=191377.30
    clat (usec): min=0 , max=1753.2K, avg=66423.25, stdev=81774.35
    clat (usec): min=0 , max=2915.5K, avg=61685.70, stdev=120598.39
    clat (usec): min=0 , max=1933.4K, avg=76935.12, stdev=96603.45
  cpu : usr=74.08%, sys=703.84%, ctx=29661403, majf=21354, minf=22460954
  cpu : usr=70.92%, sys=702.81%, ctx=77219828, majf=13980, minf=27713137
  cpu : usr=72.23%, sys=695.37%, ctx=88081059, majf=18475, minf=28177648
  cpu : usr=69.69%, sys=654.13%, ctx=145476035, majf=15867, minf=26176375
 With request-based IO path:
  seq-read  : io=2048.0MB, bw=91074KB/s, iops=182147 , runt= 23027msec
  seq-write : io=2048.0MB, bw=80725KB/s, iops=161449 , runt= 25979msec
  rand-read : io=3095.7MB, bw=92106KB/s, iops=184211 , runt= 34416msec
  rand-write: io=3095.7MB, bw=82815KB/s, iops=165630 , runt= 38277msec
    clat (usec): min=0 , max=1932.4K, avg=77824.17, stdev=170339.49
    clat (usec): min=0 , max=2510.2K, avg=78023.96, stdev=146949.15
    clat (usec): min=0 , max=3037.2K, avg=74746.53, stdev=128498.27
    clat (usec): min=0 , max=1363.4K, avg=89830.75, stdev=114279.68
  cpu : usr=53.28%, sys=724.19%, ctx=37988895, majf=17531, minf=23577622
  cpu : usr=49.03%, sys=633.20%, ctx=205935380, majf=18197, minf=27288959
  cpu : usr=55.78%, sys=722.40%, ctx=101525058, majf=19273, minf=28067082
  cpu : usr=56.55%, sys=690.83%, ctx=228205022, majf=18039, minf=26551985

2) Fio test is performed in a 8 vcpu guest with Fusion-IO based guest using
kvm tool.

Short version:
 With bio-based IO path, sequential read/write, random read/write
 IOPS boost         : 11%, 11%, 13%, 10%
 Latency improvement: 10%, 10%, 12%, 10%
Long Version:
 With bio-based IO path:
  read : io=2048.0MB, bw=58920KB/s, iops=117840 , runt= 35593msec
  write: io=2048.0MB, bw=64308KB/s, iops=128616 , runt= 32611msec
  read : io=3095.7MB, bw=59633KB/s, iops=119266 , runt= 53157msec
  write: io=3095.7MB, bw=62993KB/s, iops=125985 , runt= 50322msec
    clat (usec): min=0 , max=1284.3K, avg=128109.01, stdev=71513.29
    clat (usec): min=94 , max=962339 , avg=116832.95, stdev=65836.80
    clat (usec): min=0 , max=1846.6K, avg=128509.99, stdev=89575.07
    clat (usec): min=0 , max=2256.4K, avg=121361.84, stdev=82747.25
  cpu : usr=56.79%, sys=421.70%, ctx=147335118, majf=21080, minf=19852517
  cpu : usr=61.81%, sys=455.53%, ctx=143269950, majf=16027, minf=24800604
  cpu : usr=63.10%, sys=455.38%, ctx=178373538, majf=16958, minf=24822612
  cpu : usr=62.04%, sys=453.58%, ctx=226902362, majf=16089, minf=23278105
 With request-based IO path:
  read : io=2048.0MB, bw=52896KB/s, iops=105791 , runt= 39647msec
  write: io=2048.0MB, bw=57856KB/s, iops=115711 , runt= 36248msec
  read : io=3095.7MB, bw=52387KB/s, iops=104773 , runt= 60510msec
  write: io=3095.7MB, bw=57310KB/s, iops=114619 , runt= 55312msec
    clat (usec): min=0 , max=1532.6K, avg=142085.62, stdev=109196.84
    clat (usec): min=0 , max=1487.4K, avg=129110.71, stdev=114973.64
    clat (usec): min=0 , max=1388.6K, avg=145049.22, stdev=107232.55
    clat (usec): min=0 , max=1465.9K, avg=133585.67, stdev=110322.95
  cpu : usr=44.08%, sys=590.71%, ctx=451812322, majf=14841, minf=17648641
  cpu : usr=48.73%, sys=610.78%, ctx=418953997, majf=22164, minf=26850689
  cpu : usr=45.58%, sys=581.16%, ctx=714079216, majf=21497, minf=22558223
  cpu : usr=48.40%, sys=599.65%, ctx=656089423, majf=16393, minf=23824409

3) Fio test is performed in a 8 vcpu guest with normal SATA based guest
using kvm tool.

Short version:
 With bio-based IO path, sequential read/write, random read/write
 IOPS boost         : -10%, -10%, 4.4%, 0.5%
 Latency improvement: -12%, -15%, 2.5%, 0.8%
Long Version:
 With bio-based IO path:
  read : io=124812KB, bw=36537KB/s, iops=9060 , runt=  3416msec
  write: io=169180KB, bw=24406KB/s, iops=6065 , runt=  6932msec
  read : io=256200KB, bw=2089.3KB/s, iops=520 , runt=122630msec
  write: io=257988KB, bw=1545.7KB/s, iops=384 , runt=166910msec
    clat (msec): min=1 , max=1527 , avg=28.06, stdev=89.54
    clat (msec): min=2 , max=344 , avg=41.12, stdev=38.70
    clat (msec): min=8 , max=1984 , avg=490.63, stdev=207.28
    clat (msec): min=33 , max=4131 , avg=659.19, stdev=304.71
  cpu          : usr=4.85%, sys=17.15%, ctx=31593, majf=0, minf=7
  cpu          : usr=3.04%, sys=11.45%, ctx=39377, majf=0, minf=0
  cpu          : usr=0.47%, sys=1.59%, ctx=262986, majf=0, minf=16
  cpu          : usr=0.47%, sys=1.46%, ctx=337410, majf=0, minf=0

 With request-based IO path:
  read : io=150120KB, bw=40420KB/s, iops=10037 , runt=  3714msec
  write: io=194932KB, bw=27029KB/s, iops=6722 , runt=  7212msec
  read : io=257136KB, bw=2001.1KB/s, iops=498 , runt=128443msec
  write: io=258276KB, bw=1537.2KB/s, iops=382 , runt=168028msec
    clat (msec): min=1 , max=1542 , avg=24.84, stdev=32.45
    clat (msec): min=3 , max=628 , avg=35.62, stdev=39.71
    clat (msec): min=8 , max=2540 , avg=503.28, stdev=236.97
    clat (msec): min=41 , max=4398 , avg=653.88, stdev=302.61
  cpu          : usr=3.91%, sys=15.75%, ctx=26968, majf=0, minf=23
  cpu          : usr=2.50%, sys=10.56%, ctx=19090, majf=0, minf=0
  cpu          : usr=0.16%, sys=0.43%, ctx=20159, majf=0, minf=16
  cpu          : usr=0.18%, sys=0.53%, ctx=81364, majf=0, minf=0

How to use:
-----------------------------
Add 'virtio_blk.use_bio=1' to kernel cmdline or 'modprobe virtio_blk
use_bio=1' to enable ->make_request_fn() based I/O path.

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Tejun Heo <tj@kernel.org>
Cc: Shaohua Li <shli@kernel.org>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: virtualization@lists.linux-foundation.org
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NMinchan Kim <minchan.kim@gmail.com>
Signed-off-by: NAsias He <asias@redhat.com>
Acked-by: NRusty Russell <rusty@rustcorp.com.au>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

a98755c5

30 7月, 2012 4 次提交

virtio-blk: allow toggling host cache between writeback and writethrough · cd5d5038

由 Paolo Bonzini 提交于 7月 03, 2012

This patch adds support for the new VIRTIO_BLK_F_CONFIG_WCE feature,
which exposes the cache mode in the configuration space and lets the
driver modify it. The cache mode is exposed via sysfs.

Even if the host does not support the new feature, the cache mode is
visible (thanks to the existing VIRTIO_BLK_F_WCE), but not modifiable.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

cd5d5038

virtio-blk: Use block layer provided spinlock · 2c95a329

由 Asias He 提交于 5月 25, 2012

Block layer will allocate a spinlock for the queue if the driver does
not provide one in blk_init_queue().

The reason to use the internal spinlock is that blk_cleanup_queue() will
switch to use the internal spinlock in the cleanup code path.

        if (q->queue_lock != &q->__queue_lock)
                q->queue_lock = &q->__queue_lock;

However, processes which are in D state might have taken the driver
provided spinlock, when the processes wake up, they would release the
block provided spinlock.

=====================================
[ BUG: bad unlock balance detected! ]
3.4.0-rc7+ #238 Not tainted
-------------------------------------
fio/3587 is trying to release lock (&(&q->__queue_lock)->rlock) at:
[<ffffffff813274d2>] blk_queue_bio+0x2a2/0x380
but there are no more locks to release!

other info that might help us debug this:
1 lock held by fio/3587:
 #0:  (&(&vblk->lock)->rlock){......}, at:
[<ffffffff8132661a>] get_request_wait+0x19a/0x250

Other drivers use block layer provided spinlock as well, e.g. SCSI.

Switching to the block layer provided spinlock saves a bit of memory and
does not increase lock contention. Performance test shows no real
difference is observed before and after this patch.

Changes in v2: Improve commit log as Michael suggested.

Cc: virtualization@lists.linux-foundation.org
Cc: kvm@vger.kernel.org
Cc: stable@kernel.org
Signed-off-by: NAsias He <asias@redhat.com>
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

2c95a329

virtio-blk: Reset device after blk_cleanup_queue() · 483001c7

由 Asias He 提交于 5月 25, 2012

blk_cleanup_queue() will call blk_drian_queue() to drain all the
requests before queue DEAD marking. If we reset the device before
blk_cleanup_queue() the drain would fail.

1) if the queue is stopped in do_virtblk_request() because device is
full, the q->request_fn() will not be called.

blk_drain_queue() {
   while(true) {
      ...
      if (!list_empty(&q->queue_head))
        __blk_run_queue(q) {
	    if (queue is not stoped)
		q->request_fn()
	}
      ...
   }
}

Do no reset the device before blk_cleanup_queue() gives the chance to
start the queue in interrupt handler blk_done().

2) In commit b79d866c, We abort requests
dispatched to driver before blk_cleanup_queue(). There is a race if
requests are dispatched to driver after the abort and before the queue
DEAD mark. To fix this, instead of aborting the requests explicitly, we
can just reset the device after after blk_cleanup_queue so that the
device can complete all the requests before queue DEAD marking in the
drain process.

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: virtualization@lists.linux-foundation.org
Cc: kvm@vger.kernel.org
Cc: stable@kernel.org
Signed-off-by: NAsias He <asias@redhat.com>
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

483001c7

virtio-blk: Call del_gendisk() before disable guest kick · 02e2b124

由 Asias He 提交于 5月 25, 2012

del_gendisk() might not return due to failing to remove the
/sys/block/vda/serial sysfs entry when another thread (udev) is
trying to read it.

virtblk_remove()
  vdev->config->reset() : guest will not kick us through interrupt
    del_gendisk()
      device_del()
        kobject_del(): got stuck, sysfs entry ref count non zero

sysfs_open_file(): user space process read /sys/block/vda/serial
   sysfs_get_active() : got sysfs entry ref count
      dev_attr_show()
        virtblk_serial_show()
           blk_execute_rq() : got stuck, interrupt is disabled
                              request cannot be finished

This patch fixes it by calling del_gendisk() before we disable guest's
interrupt so that the request sent in virtblk_serial_show() will be
finished and del_gendisk() will success.

This fixes another race in hot-unplug process.

It is save to call del_gendisk(vblk->disk) before
flush_work(&vblk->config_work) which might access vblk->disk, because
vblk->disk is not freed until put_disk(vblk->disk).

Cc: virtualization@lists.linux-foundation.org
Cc: kvm@vger.kernel.org
Cc: stable@kernel.org
Signed-off-by: NAsias He <asias@redhat.com>
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

02e2b124

22 5月, 2012 2 次提交

virtio_blk: Drop unused request tracking list · f65ca1dc

由 Asias He 提交于 3月 30, 2012

Benchmark shows small performance improvement on fusion io device.

Before:
  seq-read : io=1,024MB, bw=19,982KB/s, iops=39,964, runt= 52475msec
  seq-write: io=1,024MB, bw=20,321KB/s, iops=40,641, runt= 51601msec
  rnd-read : io=1,024MB, bw=15,404KB/s, iops=30,808, runt= 68070msec
  rnd-write: io=1,024MB, bw=14,776KB/s, iops=29,552, runt= 70963msec

After:
  seq-read : io=1,024MB, bw=20,343KB/s, iops=40,685, runt= 51546msec
  seq-write: io=1,024MB, bw=20,803KB/s, iops=41,606, runt= 50404msec
  rnd-read : io=1,024MB, bw=16,221KB/s, iops=32,442, runt= 64642msec
  rnd-write: io=1,024MB, bw=15,199KB/s, iops=30,397, runt= 68991msec
Signed-off-by: NAsias He <asias@redhat.com>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

f65ca1dc

virtio-blk: Fix hot-unplug race in remove method · b79d866c

由 Asias He 提交于 5月 04, 2012

If we reset the virtio-blk device before the requests already dispatched
to the virtio-blk driver from the block layer are finised, we will stuck
in blk_cleanup_queue() and the remove will fail.

blk_cleanup_queue() calls blk_drain_queue() to drain all requests queued
before DEAD marking. However it will never success if the device is
already stopped. We'll have q->in_flight[] > 0, so the drain will not
finish.

How to reproduce the race:
1. hot-plug a virtio-blk device
2. keep reading/writing the device in guest
3. hot-unplug while the device is busy serving I/O

Test:
~1000 rounds of hot-plug/hot-unplug test passed with this patch.

Changes in v3:
- Drop blk_abort_queue and blk_abort_request
- Use __blk_end_request_all to complete request dispatched to driver

Changes in v2:
- Drop req_in_flight
- Use virtqueue_detach_unused_buf to get request dispatched to driver
Signed-off-by: NAsias He <asias@redhat.com>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

b79d866c

12 4月, 2012 1 次提交

virtio_blk: helper function to format disk names · c0aa3e09

由 Ren Mingxin 提交于 4月 10, 2012

The current virtio block's naming algorithm just supports 18278
(26^3 + 26^2 + 26) disks. If there are more virtio blocks,
there will be disks with the same name.

Based on commit 3e1a7ff8, add
a function "virtblk_name_format()" for virtio block to support mass
of disks naming.

Notes:
- Our naming scheme is ugly. We are stuck with it
  for virtio but don't use it for any new driver:
  new drivers should name their devices PREFIX%d
  where the sequence number can be allocated by ida
- sd_format_disk_name has exactly the same logic.
  Moving it to a central place was deferred over worries
  that this will make people keep using the legacy naming
  in new drivers.
  We kept code idential in case someone wants to deduplicate later.
Signed-off-by: NRen Mingxin <renmx@cn.fujitsu.com>
Acked-by: NAsias He <asias@redhat.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

c0aa3e09

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功