1. 30 5月, 2014 1 次提交
    • M
      block: virtio_blk: don't hold spin lock during world switch · e8edca6f
      Ming Lei 提交于
      Firstly, it isn't necessary to hold lock of vblk->vq_lock
      when notifying hypervisor about queued I/O.
      
      Secondly, virtqueue_notify() will cause world switch and
      it may take long time on some hypervisors(such as, qemu-arm),
      so it isn't good to hold the lock and block other vCPUs.
      
      On arm64 quad core VM(qemu-kvm), the patch can increase I/O
      performance a lot with VIRTIO_RING_F_EVENT_IDX enabled:
      	- without the patch: 14K IOPS
      	- with the patch: 34K IOPS
      
      fio script:
      	[global]
      	direct=1
      	bsrange=4k-4k
      	timeout=10
      	numjobs=4
      	ioengine=libaio
      	iodepth=64
      
      	filename=/dev/vdc
      	group_reporting=1
      
      	[f1]
      	rw=randread
      
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: virtualization@lists.linux-foundation.org
      Signed-off-by: NMing Lei <ming.lei@canonical.com>
      Acked-by: NRusty Russell <rusty@rustcorp.com.au>
      Cc: stable@kernel.org # 3.13+
      Signed-off-by: NJens Axboe <axboe@fb.com>
      e8edca6f
  2. 29 5月, 2014 1 次提交
  3. 16 5月, 2014 1 次提交
    • M
      virtio_blk: fix race between start and stop queue · 0c29e93e
      Ming Lei 提交于
      When there isn't enough vring descriptor for adding to vq,
      blk-mq will be put as stopped state until some of pending
      descriptors are completed & freed.
      
      Unfortunately, the vq's interrupt may come just before
      blk-mq's BLK_MQ_S_STOPPED flag is set, so the blk-mq will
      still be kept as stopped even though lots of descriptors
      are completed and freed in the interrupt handler. The worst
      case is that all pending descriptors are freed in the
      interrupt handler, and the queue is kept as stopped forever.
      
      This patch fixes the problem by starting/stopping blk-mq
      with holding vq_lock.
      
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NMing Lei <tom.leiming@gmail.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      0c29e93e
  4. 17 4月, 2014 1 次提交
  5. 16 4月, 2014 3 次提交
    • C
      blk-mq: split out tag initialization, support shared tags · 24d2f903
      Christoph Hellwig 提交于
      Add a new blk_mq_tag_set structure that gets set up before we initialize
      the queue.  A single blk_mq_tag_set structure can be shared by multiple
      queues.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      
      Modular export of blk_mq_{alloc,free}_tagset added by me.
      Signed-off-by: NJens Axboe <axboe@fb.com>
      24d2f903
    • C
      blk-mq: add ->init_request and ->exit_request methods · e9b267d9
      Christoph Hellwig 提交于
      The current blk_mq_init_commands/blk_mq_free_commands interface has a
      two problems:
      
       1) Because only the constructor is passed to blk_mq_init_commands there
          is no easy way to clean up when a comman initialization failed.  The
          current code simply leaks the allocations done in the constructor.
      
       2) There is no good place to call blk_mq_free_commands: before
          blk_cleanup_queue there is no guarantee that all outstanding
          commands have completed, so we can't free them yet.  After
          blk_cleanup_queue the queue has usually been freed.  This can be
          worked around by grabbing an unconditional reference before calling
          blk_cleanup_queue and dropping it after blk_mq_free_commands is
          done, although that's not exatly pretty and driver writers are
          guaranteed to get it wrong sooner or later.
      
      Both issues are easily fixed by making the request constructor and
      destructor normal blk_mq_ops methods.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      e9b267d9
    • C
      blk-mq: do not initialize req->special · 9d74e257
      Christoph Hellwig 提交于
      Drivers can reach their private data easily using the blk_mq_rq_to_pdu
      helper and don't need req->special.  By not initializing it code can
      be simplified nicely, and we also shave off a few more instructions from
      the I/O path.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      9d74e257
  6. 24 3月, 2014 1 次提交
    • R
      virtio-blk: base queue-depth on virtqueue ringsize or module param · fc4324b4
      Rusty Russell 提交于
      Venkatash spake thus:
      
        virtio-blk set the default queue depth to 64 requests, which was
        insufficient for high-IOPS devices. Instead set the blk-queue depth to
        the device's virtqueue depth divided by two (each I/O requires at least
        two VQ entries).
      
      But behold, Ted added a module parameter:
      
        Also allow the queue depth to be something which can be set at module
        load time or via a kernel boot-time parameter, for
        testing/benchmarking purposes.
      
      And I rewrote it substantially, mainly to take
      VIRTIO_RING_F_INDIRECT_DESC into account.
      
      As QEMU sets the vq size for PCI to 128, Venkatash's patch wouldn't
      have made a change.  This version does (since QEMU also offers
      VIRTIO_RING_F_INDIRECT_DESC.
      Inspired-by: N"Theodore Ts'o" <tytso@mit.edu>
      Based-on-the-true-story-of: Venkatesh Srinivas <venkateshs@google.com>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: virtio-dev@lists.oasis-open.org
      Cc: virtualization@lists.linux-foundation.org
      Cc: Frank Swiderski <fes@google.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      fc4324b4
  7. 15 3月, 2014 1 次提交
    • J
      blk-mq: allow blk_mq_init_commands() to return failure · 95363efd
      Jens Axboe 提交于
      If drivers do dynamic allocation in the hardware command init
      path, then we need to be able to handle and return failures.
      
      And if they do allocations or mappings in the init command path,
      then we need a cleanup function to free up that space at exit
      time. So add blk_mq_free_commands() as the cleanup function.
      
      This is required for the mtip32xx driver conversion to blk-mq.
      Signed-off-by: NJens Axboe <axboe@fb.com>
      95363efd
  8. 13 3月, 2014 1 次提交
  9. 11 2月, 2014 1 次提交
  10. 20 11月, 2013 1 次提交
  11. 14 11月, 2013 1 次提交
  12. 29 10月, 2013 1 次提交
  13. 17 10月, 2013 1 次提交
  14. 23 9月, 2013 1 次提交
  15. 20 5月, 2013 1 次提交
  16. 20 3月, 2013 4 次提交
  17. 12 3月, 2013 1 次提交
    • M
      virtio-blk: emit udev event when device is resized · 9d9598b8
      Milos Vyletel 提交于
      When virtio-blk device is resized from host (using block_resize from QEMU) emit
      KOBJ_CHANGE uevent to notify guest about such change. This allows user to have
      custom udev rules which would take whatever action if such event occurs. As a
      proof of concept I've created simple udev rule that automatically resize
      filesystem on virtio-blk device.
      
      ACTION=="change", KERNEL=="vd*", \
              ENV{RESIZE}=="1", \
              ENV{ID_FS_TYPE}=="ext[3-4]", \
              RUN+="/sbin/resize2fs /dev/%k"
      ACTION=="change", KERNEL=="vd*", \
              ENV{RESIZE}=="1", \
              ENV{ID_FS_TYPE}=="LVM2_member", \
              RUN+="/sbin/pvresize /dev/%k"
      Signed-off-by: NMilos Vyletel <milos.vyletel@sde.cz>
      Tested-by: NAsias He <asias@redhat.com>
      Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (minor simplification)
      9d9598b8
  18. 04 1月, 2013 1 次提交
    • G
      Drivers: block: remove __dev* attributes. · 8d85fce7
      Greg Kroah-Hartman 提交于
      CONFIG_HOTPLUG is going away as an option.  As a result, the __dev*
      markings need to be removed.
      
      This change removes the use of __devinit, __devexit_p, __devinitdata,
      __devinitconst, and __devexit from these drivers.
      
      Based on patches originally written by Bill Pemberton, but redone by me
      in order to handle some of the coding style issues better, by hand.
      
      Cc: Bill Pemberton <wfp5p@virginia.edu>
      Cc: Mike Miller <mike.miller@hp.com>
      Cc: Chirag Kantharia <chirag.kantharia@hp.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Jim Paris <jim@jtan.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Grant Likely <grant.likely@secretlab.ca>
      Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: NeilBrown <neilb@suse.de>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Tao Guo <Tao.Guo@emc.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8d85fce7
  19. 02 1月, 2013 1 次提交
    • A
      virtio-blk: Don't free ida when disk is in use · f4953fe6
      Alexander Graf 提交于
      When a file system is mounted on a virtio-blk disk, we then remove it
      and then reattach it, the reattached disk gets the same disk name and
      ids as the hot removed one.
      
      This leads to very nasty effects - mostly rendering the newly attached
      device completely unusable.
      
      Trying what happens when I do the same thing with a USB device, I saw
      that the sd node simply doesn't get free'd when a device gets forcefully
      removed.
      
      Imitate the same behavior for vd devices. This way broken vd devices
      simply are never free'd and newly attached ones keep working just fine.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Cc: stable@kernel.org
      f4953fe6
  20. 28 9月, 2012 4 次提交
    • A
      virtio-blk: Disable callback in virtblk_done() · bb811108
      Asias He 提交于
      This reduces unnecessary interrupts that host could send to guest while
      guest is in the progress of irq handling.
      
      If one vcpu is handling the irq, while another interrupt comes, in
      handle_edge_irq(), the guest will mask the interrupt via mask_msi_irq()
      which is a very heavy operation that goes all the way down to host.
      
       Here are some performance numbers on qemu:
      
       Before:
       -------------------------------------
         seq-read  : io=0 B, bw=269730KB/s, iops=67432 , runt= 62200msec
         seq-write : io=0 B, bw=339716KB/s, iops=84929 , runt= 49386msec
         rand-read : io=0 B, bw=270435KB/s, iops=67608 , runt= 62038msec
         rand-write: io=0 B, bw=354436KB/s, iops=88608 , runt= 47335msec
           clat (usec): min=101 , max=138052 , avg=14822.09, stdev=11771.01
           clat (usec): min=96 , max=81543 , avg=11798.94, stdev=7735.60
           clat (usec): min=128 , max=140043 , avg=14835.85, stdev=11765.33
           clat (usec): min=109 , max=147207 , avg=11337.09, stdev=5990.35
         cpu          : usr=15.93%, sys=60.37%, ctx=7764972, majf=0, minf=54
         cpu          : usr=32.73%, sys=120.49%, ctx=7372945, majf=0, minf=1
         cpu          : usr=18.84%, sys=58.18%, ctx=7775420, majf=0, minf=1
         cpu          : usr=24.20%, sys=59.85%, ctx=8307886, majf=0, minf=0
         vdb: ios=8389107/8368136, merge=0/0, ticks=19457874/14616506,
       in_queue=34206098, util=99.68%
        43: interrupt in total: 887320
       fio --exec_prerun="echo 3 > /proc/sys/vm/drop_caches" --group_reporting
       --ioscheduler=noop --thread --bs=4k --size=512MB --direct=1 --numjobs=16
       --ioengine=libaio --iodepth=64 --loops=3 --ramp_time=0
       --filename=/dev/vdb --name=seq-read --stonewall --rw=read
       --name=seq-write --stonewall --rw=write --name=rnd-read --stonewall
       --rw=randread --name=rnd-write --stonewall --rw=randwrite
      
       After:
       -------------------------------------
         seq-read  : io=0 B, bw=309503KB/s, iops=77375 , runt= 54207msec
         seq-write : io=0 B, bw=448205KB/s, iops=112051 , runt= 37432msec
         rand-read : io=0 B, bw=311254KB/s, iops=77813 , runt= 53902msec
         rand-write: io=0 B, bw=377152KB/s, iops=94287 , runt= 44484msec
           clat (usec): min=81 , max=90588 , avg=12946.06, stdev=9085.94
           clat (usec): min=57 , max=72264 , avg=8967.97, stdev=5951.04
           clat (usec): min=29 , max=101046 , avg=12889.95, stdev=9067.91
           clat (usec): min=52 , max=106152 , avg=10660.56, stdev=4778.19
         cpu          : usr=15.05%, sys=57.92%, ctx=77109411, majf=0, minf=54
         cpu          : usr=26.78%, sys=101.40%, ctx=7387891, majf=0, minf=2
         cpu          : usr=19.03%, sys=58.17%, ctx=7681976, majf=0, minf=8
         cpu          : usr=24.65%, sys=58.34%, ctx=8442632, majf=0, minf=4
         vdb: ios=8389086/8361888, merge=0/0, ticks=17243780/12742010,
       in_queue=30078377, util=99.59%
        43: interrupt in total: 1259639
       fio --exec_prerun="echo 3 > /proc/sys/vm/drop_caches" --group_reporting
       --ioscheduler=noop --thread --bs=4k --size=512MB --direct=1 --numjobs=16
       --ioengine=libaio --iodepth=64 --loops=3 --ramp_time=0
       --filename=/dev/vdb --name=seq-read --stonewall --rw=read
       --name=seq-write --stonewall --rw=write --name=rnd-read --stonewall
       --rw=randread --name=rnd-write --stonewall --rw=randwrite
      Signed-off-by: NAsias He <asias@redhat.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      bb811108
    • D
      virtio-blk: fix NULL checking in virtblk_alloc_req() · f22cf8eb
      Dan Carpenter 提交于
      Smatch complains about the inconsistent NULL checking here.  Fix it to
      return NULL on failure.
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (fixed accidental deletion)
      f22cf8eb
    • A
      virtio-blk: Add REQ_FLUSH and REQ_FUA support to bio path · c85a1f91
      Asias He 提交于
      We need to support both REQ_FLUSH and REQ_FUA for bio based path since
      it does not get the sequencing of REQ_FUA into REQ_FLUSH that request
      based drivers can request.
      
      REQ_FLUSH is emulated by:
      A) If the bio has no data to write:
      1. Send VIRTIO_BLK_T_FLUSH to device,
      2. In the flush I/O completion handler, finish the bio
      
      B) If the bio has data to write:
      1. Send VIRTIO_BLK_T_FLUSH to device
      2. In the flush I/O completion handler, send the actual write data to device
      3. In the write I/O completion handler, finish the bio
      
      REQ_FUA is emulated by:
      1. Send the actual write data to device
      2. In the write I/O completion handler, send VIRTIO_BLK_T_FLUSH to device
      3. In the flush I/O completion handler, finish the bio
      
      Changes in v7:
      - Using vbr->flags to trace request type
      - Dropped unnecessary struct virtio_blk *vblk parameter
      - Reuse struct virtblk_req in bio done function
      
      Cahnges in v6:
      - Reworked REQ_FLUSH and REQ_FUA emulatation order
      
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Shaohua Li <shli@kernel.org>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: kvm@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Cc: virtualization@lists.linux-foundation.org
      Signed-off-by: NAsias He <asias@redhat.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      c85a1f91
    • A
      virtio-blk: Add bio-based IO path for virtio-blk · a98755c5
      Asias He 提交于
      This patch introduces bio-based IO path for virtio-blk.
      
      Compared to request-based IO path, bio-based IO path uses driver
      provided ->make_request_fn() method to bypasses the IO scheduler. It
      handles the bio to device directly without allocating a request in block
      layer. This reduces the IO path in guest kernel to achieve high IOPS
      and lower latency. The downside is that guest can not use the IO
      scheduler to merge and sort requests. However, this is not a big problem
      if the backend disk in host side uses faster disk device.
      
      When the bio-based IO path is not enabled, virtio-blk still uses the
      original request-based IO path, no performance difference is observed.
      
      Using a slow device e.g. normal SATA disk, the bio-based IO path for
      sequential read and write are slower than req-based IO path due to lack
      of merge in guest kernel. So we make the bio-based path optional.
      
      Performance evaluation:
      -----------------------------
      1) Fio test is performed in a 8 vcpu guest with ramdisk based guest using
      kvm tool.
      
      Short version:
       With bio-based IO path, sequential read/write, random read/write
       IOPS boost         : 28%, 24%, 21%, 16%
       Latency improvement: 32%, 17%, 21%, 16%
      
      Long version:
       With bio-based IO path:
        seq-read  : io=2048.0MB, bw=116996KB/s, iops=233991 , runt= 17925msec
        seq-write : io=2048.0MB, bw=100829KB/s, iops=201658 , runt= 20799msec
        rand-read : io=3095.7MB, bw=112134KB/s, iops=224268 , runt= 28269msec
        rand-write: io=3095.7MB, bw=96198KB/s,  iops=192396 , runt= 32952msec
          clat (usec): min=0 , max=2631.6K, avg=58716.99, stdev=191377.30
          clat (usec): min=0 , max=1753.2K, avg=66423.25, stdev=81774.35
          clat (usec): min=0 , max=2915.5K, avg=61685.70, stdev=120598.39
          clat (usec): min=0 , max=1933.4K, avg=76935.12, stdev=96603.45
        cpu : usr=74.08%, sys=703.84%, ctx=29661403, majf=21354, minf=22460954
        cpu : usr=70.92%, sys=702.81%, ctx=77219828, majf=13980, minf=27713137
        cpu : usr=72.23%, sys=695.37%, ctx=88081059, majf=18475, minf=28177648
        cpu : usr=69.69%, sys=654.13%, ctx=145476035, majf=15867, minf=26176375
       With request-based IO path:
        seq-read  : io=2048.0MB, bw=91074KB/s, iops=182147 , runt= 23027msec
        seq-write : io=2048.0MB, bw=80725KB/s, iops=161449 , runt= 25979msec
        rand-read : io=3095.7MB, bw=92106KB/s, iops=184211 , runt= 34416msec
        rand-write: io=3095.7MB, bw=82815KB/s, iops=165630 , runt= 38277msec
          clat (usec): min=0 , max=1932.4K, avg=77824.17, stdev=170339.49
          clat (usec): min=0 , max=2510.2K, avg=78023.96, stdev=146949.15
          clat (usec): min=0 , max=3037.2K, avg=74746.53, stdev=128498.27
          clat (usec): min=0 , max=1363.4K, avg=89830.75, stdev=114279.68
        cpu : usr=53.28%, sys=724.19%, ctx=37988895, majf=17531, minf=23577622
        cpu : usr=49.03%, sys=633.20%, ctx=205935380, majf=18197, minf=27288959
        cpu : usr=55.78%, sys=722.40%, ctx=101525058, majf=19273, minf=28067082
        cpu : usr=56.55%, sys=690.83%, ctx=228205022, majf=18039, minf=26551985
      
      2) Fio test is performed in a 8 vcpu guest with Fusion-IO based guest using
      kvm tool.
      
      Short version:
       With bio-based IO path, sequential read/write, random read/write
       IOPS boost         : 11%, 11%, 13%, 10%
       Latency improvement: 10%, 10%, 12%, 10%
      Long Version:
       With bio-based IO path:
        read : io=2048.0MB, bw=58920KB/s, iops=117840 , runt= 35593msec
        write: io=2048.0MB, bw=64308KB/s, iops=128616 , runt= 32611msec
        read : io=3095.7MB, bw=59633KB/s, iops=119266 , runt= 53157msec
        write: io=3095.7MB, bw=62993KB/s, iops=125985 , runt= 50322msec
          clat (usec): min=0 , max=1284.3K, avg=128109.01, stdev=71513.29
          clat (usec): min=94 , max=962339 , avg=116832.95, stdev=65836.80
          clat (usec): min=0 , max=1846.6K, avg=128509.99, stdev=89575.07
          clat (usec): min=0 , max=2256.4K, avg=121361.84, stdev=82747.25
        cpu : usr=56.79%, sys=421.70%, ctx=147335118, majf=21080, minf=19852517
        cpu : usr=61.81%, sys=455.53%, ctx=143269950, majf=16027, minf=24800604
        cpu : usr=63.10%, sys=455.38%, ctx=178373538, majf=16958, minf=24822612
        cpu : usr=62.04%, sys=453.58%, ctx=226902362, majf=16089, minf=23278105
       With request-based IO path:
        read : io=2048.0MB, bw=52896KB/s, iops=105791 , runt= 39647msec
        write: io=2048.0MB, bw=57856KB/s, iops=115711 , runt= 36248msec
        read : io=3095.7MB, bw=52387KB/s, iops=104773 , runt= 60510msec
        write: io=3095.7MB, bw=57310KB/s, iops=114619 , runt= 55312msec
          clat (usec): min=0 , max=1532.6K, avg=142085.62, stdev=109196.84
          clat (usec): min=0 , max=1487.4K, avg=129110.71, stdev=114973.64
          clat (usec): min=0 , max=1388.6K, avg=145049.22, stdev=107232.55
          clat (usec): min=0 , max=1465.9K, avg=133585.67, stdev=110322.95
        cpu : usr=44.08%, sys=590.71%, ctx=451812322, majf=14841, minf=17648641
        cpu : usr=48.73%, sys=610.78%, ctx=418953997, majf=22164, minf=26850689
        cpu : usr=45.58%, sys=581.16%, ctx=714079216, majf=21497, minf=22558223
        cpu : usr=48.40%, sys=599.65%, ctx=656089423, majf=16393, minf=23824409
      
      3) Fio test is performed in a 8 vcpu guest with normal SATA based guest
      using kvm tool.
      
      Short version:
       With bio-based IO path, sequential read/write, random read/write
       IOPS boost         : -10%, -10%, 4.4%, 0.5%
       Latency improvement: -12%, -15%, 2.5%, 0.8%
      Long Version:
       With bio-based IO path:
        read : io=124812KB, bw=36537KB/s, iops=9060 , runt=  3416msec
        write: io=169180KB, bw=24406KB/s, iops=6065 , runt=  6932msec
        read : io=256200KB, bw=2089.3KB/s, iops=520 , runt=122630msec
        write: io=257988KB, bw=1545.7KB/s, iops=384 , runt=166910msec
          clat (msec): min=1 , max=1527 , avg=28.06, stdev=89.54
          clat (msec): min=2 , max=344 , avg=41.12, stdev=38.70
          clat (msec): min=8 , max=1984 , avg=490.63, stdev=207.28
          clat (msec): min=33 , max=4131 , avg=659.19, stdev=304.71
        cpu          : usr=4.85%, sys=17.15%, ctx=31593, majf=0, minf=7
        cpu          : usr=3.04%, sys=11.45%, ctx=39377, majf=0, minf=0
        cpu          : usr=0.47%, sys=1.59%, ctx=262986, majf=0, minf=16
        cpu          : usr=0.47%, sys=1.46%, ctx=337410, majf=0, minf=0
      
       With request-based IO path:
        read : io=150120KB, bw=40420KB/s, iops=10037 , runt=  3714msec
        write: io=194932KB, bw=27029KB/s, iops=6722 , runt=  7212msec
        read : io=257136KB, bw=2001.1KB/s, iops=498 , runt=128443msec
        write: io=258276KB, bw=1537.2KB/s, iops=382 , runt=168028msec
          clat (msec): min=1 , max=1542 , avg=24.84, stdev=32.45
          clat (msec): min=3 , max=628 , avg=35.62, stdev=39.71
          clat (msec): min=8 , max=2540 , avg=503.28, stdev=236.97
          clat (msec): min=41 , max=4398 , avg=653.88, stdev=302.61
        cpu          : usr=3.91%, sys=15.75%, ctx=26968, majf=0, minf=23
        cpu          : usr=2.50%, sys=10.56%, ctx=19090, majf=0, minf=0
        cpu          : usr=0.16%, sys=0.43%, ctx=20159, majf=0, minf=16
        cpu          : usr=0.18%, sys=0.53%, ctx=81364, majf=0, minf=0
      
      How to use:
      -----------------------------
      Add 'virtio_blk.use_bio=1' to kernel cmdline or 'modprobe virtio_blk
      use_bio=1' to enable ->make_request_fn() based I/O path.
      
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Shaohua Li <shli@kernel.org>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: kvm@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Cc: virtualization@lists.linux-foundation.org
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NMinchan Kim <minchan.kim@gmail.com>
      Signed-off-by: NAsias He <asias@redhat.com>
      Acked-by: NRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      a98755c5
  21. 30 7月, 2012 4 次提交
    • P
      virtio-blk: allow toggling host cache between writeback and writethrough · cd5d5038
      Paolo Bonzini 提交于
      This patch adds support for the new VIRTIO_BLK_F_CONFIG_WCE feature,
      which exposes the cache mode in the configuration space and lets the
      driver modify it.  The cache mode is exposed via sysfs.
      
      Even if the host does not support the new feature, the cache mode is
      visible (thanks to the existing VIRTIO_BLK_F_WCE), but not modifiable.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      cd5d5038
    • A
      virtio-blk: Use block layer provided spinlock · 2c95a329
      Asias He 提交于
      Block layer will allocate a spinlock for the queue if the driver does
      not provide one in blk_init_queue().
      
      The reason to use the internal spinlock is that blk_cleanup_queue() will
      switch to use the internal spinlock in the cleanup code path.
      
              if (q->queue_lock != &q->__queue_lock)
                      q->queue_lock = &q->__queue_lock;
      
      However, processes which are in D state might have taken the driver
      provided spinlock, when the processes wake up, they would release the
      block provided spinlock.
      
      =====================================
      [ BUG: bad unlock balance detected! ]
      3.4.0-rc7+ #238 Not tainted
      -------------------------------------
      fio/3587 is trying to release lock (&(&q->__queue_lock)->rlock) at:
      [<ffffffff813274d2>] blk_queue_bio+0x2a2/0x380
      but there are no more locks to release!
      
      other info that might help us debug this:
      1 lock held by fio/3587:
       #0:  (&(&vblk->lock)->rlock){......}, at:
      [<ffffffff8132661a>] get_request_wait+0x19a/0x250
      
      Other drivers use block layer provided spinlock as well, e.g. SCSI.
      
      Switching to the block layer provided spinlock saves a bit of memory and
      does not increase lock contention. Performance test shows no real
      difference is observed before and after this patch.
      
      Changes in v2: Improve commit log as Michael suggested.
      
      Cc: virtualization@lists.linux-foundation.org
      Cc: kvm@vger.kernel.org
      Cc: stable@kernel.org
      Signed-off-by: NAsias He <asias@redhat.com>
      Acked-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      2c95a329
    • A
      virtio-blk: Reset device after blk_cleanup_queue() · 483001c7
      Asias He 提交于
      blk_cleanup_queue() will call blk_drian_queue() to drain all the
      requests before queue DEAD marking. If we reset the device before
      blk_cleanup_queue() the drain would fail.
      
      1) if the queue is stopped in do_virtblk_request() because device is
      full, the q->request_fn() will not be called.
      
      blk_drain_queue() {
         while(true) {
            ...
            if (!list_empty(&q->queue_head))
              __blk_run_queue(q) {
      	    if (queue is not stoped)
      		q->request_fn()
      	}
            ...
         }
      }
      
      Do no reset the device before blk_cleanup_queue() gives the chance to
      start the queue in interrupt handler blk_done().
      
      2) In commit b79d866c, We abort requests
      dispatched to driver before blk_cleanup_queue(). There is a race if
      requests are dispatched to driver after the abort and before the queue
      DEAD mark. To fix this, instead of aborting the requests explicitly, we
      can just reset the device after after blk_cleanup_queue so that the
      device can complete all the requests before queue DEAD marking in the
      drain process.
      
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: virtualization@lists.linux-foundation.org
      Cc: kvm@vger.kernel.org
      Cc: stable@kernel.org
      Signed-off-by: NAsias He <asias@redhat.com>
      Acked-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      483001c7
    • A
      virtio-blk: Call del_gendisk() before disable guest kick · 02e2b124
      Asias He 提交于
      del_gendisk() might not return due to failing to remove the
      /sys/block/vda/serial sysfs entry when another thread (udev) is
      trying to read it.
      
      virtblk_remove()
        vdev->config->reset() : guest will not kick us through interrupt
          del_gendisk()
            device_del()
              kobject_del(): got stuck, sysfs entry ref count non zero
      
      sysfs_open_file(): user space process read /sys/block/vda/serial
         sysfs_get_active() : got sysfs entry ref count
            dev_attr_show()
              virtblk_serial_show()
                 blk_execute_rq() : got stuck, interrupt is disabled
                                    request cannot be finished
      
      This patch fixes it by calling del_gendisk() before we disable guest's
      interrupt so that the request sent in virtblk_serial_show() will be
      finished and del_gendisk() will success.
      
      This fixes another race in hot-unplug process.
      
      It is save to call del_gendisk(vblk->disk) before
      flush_work(&vblk->config_work) which might access vblk->disk, because
      vblk->disk is not freed until put_disk(vblk->disk).
      
      Cc: virtualization@lists.linux-foundation.org
      Cc: kvm@vger.kernel.org
      Cc: stable@kernel.org
      Signed-off-by: NAsias He <asias@redhat.com>
      Acked-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      02e2b124
  22. 22 5月, 2012 2 次提交
    • A
      virtio_blk: Drop unused request tracking list · f65ca1dc
      Asias He 提交于
      Benchmark shows small performance improvement on fusion io device.
      
      Before:
        seq-read : io=1,024MB, bw=19,982KB/s, iops=39,964, runt= 52475msec
        seq-write: io=1,024MB, bw=20,321KB/s, iops=40,641, runt= 51601msec
        rnd-read : io=1,024MB, bw=15,404KB/s, iops=30,808, runt= 68070msec
        rnd-write: io=1,024MB, bw=14,776KB/s, iops=29,552, runt= 70963msec
      
      After:
        seq-read : io=1,024MB, bw=20,343KB/s, iops=40,685, runt= 51546msec
        seq-write: io=1,024MB, bw=20,803KB/s, iops=41,606, runt= 50404msec
        rnd-read : io=1,024MB, bw=16,221KB/s, iops=32,442, runt= 64642msec
        rnd-write: io=1,024MB, bw=15,199KB/s, iops=30,397, runt= 68991msec
      Signed-off-by: NAsias He <asias@redhat.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      f65ca1dc
    • A
      virtio-blk: Fix hot-unplug race in remove method · b79d866c
      Asias He 提交于
      If we reset the virtio-blk device before the requests already dispatched
      to the virtio-blk driver from the block layer are finised, we will stuck
      in blk_cleanup_queue() and the remove will fail.
      
      blk_cleanup_queue() calls blk_drain_queue() to drain all requests queued
      before DEAD marking. However it will never success if the device is
      already stopped. We'll have q->in_flight[] > 0, so the drain will not
      finish.
      
      How to reproduce the race:
      1. hot-plug a virtio-blk device
      2. keep reading/writing the device in guest
      3. hot-unplug while the device is busy serving I/O
      
      Test:
      ~1000 rounds of hot-plug/hot-unplug test passed with this patch.
      
      Changes in v3:
      - Drop blk_abort_queue and blk_abort_request
      - Use __blk_end_request_all to complete request dispatched to driver
      
      Changes in v2:
      - Drop req_in_flight
      - Use virtqueue_detach_unused_buf to get request dispatched to driver
      Signed-off-by: NAsias He <asias@redhat.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      b79d866c
  23. 12 4月, 2012 1 次提交
    • R
      virtio_blk: helper function to format disk names · c0aa3e09
      Ren Mingxin 提交于
      The current virtio block's naming algorithm just supports 18278
      (26^3 + 26^2 + 26) disks. If there are more virtio blocks,
      there will be disks with the same name.
      
      Based on commit 3e1a7ff8, add
      a function "virtblk_name_format()" for virtio block to support mass
      of disks naming.
      
      Notes:
      - Our naming scheme is ugly. We are stuck with it
        for virtio but don't use it for any new driver:
        new drivers should name their devices PREFIX%d
        where the sequence number can be allocated by ida
      - sd_format_disk_name has exactly the same logic.
        Moving it to a central place was deferred over worries
        that this will make people keep using the legacy naming
        in new drivers.
        We kept code idential in case someone wants to deduplicate later.
      Signed-off-by: NRen Mingxin <renmx@cn.fujitsu.com>
      Acked-by: NAsias He <asias@redhat.com>
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      c0aa3e09
  24. 29 3月, 2012 1 次提交
  25. 15 1月, 2012 1 次提交
  26. 12 1月, 2012 3 次提交