1. 30 7月, 2012 3 次提交
    • A
      virtio-blk: Use block layer provided spinlock · 2c95a329
      Asias He 提交于
      Block layer will allocate a spinlock for the queue if the driver does
      not provide one in blk_init_queue().
      
      The reason to use the internal spinlock is that blk_cleanup_queue() will
      switch to use the internal spinlock in the cleanup code path.
      
              if (q->queue_lock != &q->__queue_lock)
                      q->queue_lock = &q->__queue_lock;
      
      However, processes which are in D state might have taken the driver
      provided spinlock, when the processes wake up, they would release the
      block provided spinlock.
      
      =====================================
      [ BUG: bad unlock balance detected! ]
      3.4.0-rc7+ #238 Not tainted
      -------------------------------------
      fio/3587 is trying to release lock (&(&q->__queue_lock)->rlock) at:
      [<ffffffff813274d2>] blk_queue_bio+0x2a2/0x380
      but there are no more locks to release!
      
      other info that might help us debug this:
      1 lock held by fio/3587:
       #0:  (&(&vblk->lock)->rlock){......}, at:
      [<ffffffff8132661a>] get_request_wait+0x19a/0x250
      
      Other drivers use block layer provided spinlock as well, e.g. SCSI.
      
      Switching to the block layer provided spinlock saves a bit of memory and
      does not increase lock contention. Performance test shows no real
      difference is observed before and after this patch.
      
      Changes in v2: Improve commit log as Michael suggested.
      
      Cc: virtualization@lists.linux-foundation.org
      Cc: kvm@vger.kernel.org
      Cc: stable@kernel.org
      Signed-off-by: NAsias He <asias@redhat.com>
      Acked-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      2c95a329
    • A
      virtio-blk: Reset device after blk_cleanup_queue() · 483001c7
      Asias He 提交于
      blk_cleanup_queue() will call blk_drian_queue() to drain all the
      requests before queue DEAD marking. If we reset the device before
      blk_cleanup_queue() the drain would fail.
      
      1) if the queue is stopped in do_virtblk_request() because device is
      full, the q->request_fn() will not be called.
      
      blk_drain_queue() {
         while(true) {
            ...
            if (!list_empty(&q->queue_head))
              __blk_run_queue(q) {
      	    if (queue is not stoped)
      		q->request_fn()
      	}
            ...
         }
      }
      
      Do no reset the device before blk_cleanup_queue() gives the chance to
      start the queue in interrupt handler blk_done().
      
      2) In commit b79d866c, We abort requests
      dispatched to driver before blk_cleanup_queue(). There is a race if
      requests are dispatched to driver after the abort and before the queue
      DEAD mark. To fix this, instead of aborting the requests explicitly, we
      can just reset the device after after blk_cleanup_queue so that the
      device can complete all the requests before queue DEAD marking in the
      drain process.
      
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: virtualization@lists.linux-foundation.org
      Cc: kvm@vger.kernel.org
      Cc: stable@kernel.org
      Signed-off-by: NAsias He <asias@redhat.com>
      Acked-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      483001c7
    • A
      virtio-blk: Call del_gendisk() before disable guest kick · 02e2b124
      Asias He 提交于
      del_gendisk() might not return due to failing to remove the
      /sys/block/vda/serial sysfs entry when another thread (udev) is
      trying to read it.
      
      virtblk_remove()
        vdev->config->reset() : guest will not kick us through interrupt
          del_gendisk()
            device_del()
              kobject_del(): got stuck, sysfs entry ref count non zero
      
      sysfs_open_file(): user space process read /sys/block/vda/serial
         sysfs_get_active() : got sysfs entry ref count
            dev_attr_show()
              virtblk_serial_show()
                 blk_execute_rq() : got stuck, interrupt is disabled
                                    request cannot be finished
      
      This patch fixes it by calling del_gendisk() before we disable guest's
      interrupt so that the request sent in virtblk_serial_show() will be
      finished and del_gendisk() will success.
      
      This fixes another race in hot-unplug process.
      
      It is save to call del_gendisk(vblk->disk) before
      flush_work(&vblk->config_work) which might access vblk->disk, because
      vblk->disk is not freed until put_disk(vblk->disk).
      
      Cc: virtualization@lists.linux-foundation.org
      Cc: kvm@vger.kernel.org
      Cc: stable@kernel.org
      Signed-off-by: NAsias He <asias@redhat.com>
      Acked-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      02e2b124
  2. 22 5月, 2012 2 次提交
    • A
      virtio_blk: Drop unused request tracking list · f65ca1dc
      Asias He 提交于
      Benchmark shows small performance improvement on fusion io device.
      
      Before:
        seq-read : io=1,024MB, bw=19,982KB/s, iops=39,964, runt= 52475msec
        seq-write: io=1,024MB, bw=20,321KB/s, iops=40,641, runt= 51601msec
        rnd-read : io=1,024MB, bw=15,404KB/s, iops=30,808, runt= 68070msec
        rnd-write: io=1,024MB, bw=14,776KB/s, iops=29,552, runt= 70963msec
      
      After:
        seq-read : io=1,024MB, bw=20,343KB/s, iops=40,685, runt= 51546msec
        seq-write: io=1,024MB, bw=20,803KB/s, iops=41,606, runt= 50404msec
        rnd-read : io=1,024MB, bw=16,221KB/s, iops=32,442, runt= 64642msec
        rnd-write: io=1,024MB, bw=15,199KB/s, iops=30,397, runt= 68991msec
      Signed-off-by: NAsias He <asias@redhat.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      f65ca1dc
    • A
      virtio-blk: Fix hot-unplug race in remove method · b79d866c
      Asias He 提交于
      If we reset the virtio-blk device before the requests already dispatched
      to the virtio-blk driver from the block layer are finised, we will stuck
      in blk_cleanup_queue() and the remove will fail.
      
      blk_cleanup_queue() calls blk_drain_queue() to drain all requests queued
      before DEAD marking. However it will never success if the device is
      already stopped. We'll have q->in_flight[] > 0, so the drain will not
      finish.
      
      How to reproduce the race:
      1. hot-plug a virtio-blk device
      2. keep reading/writing the device in guest
      3. hot-unplug while the device is busy serving I/O
      
      Test:
      ~1000 rounds of hot-plug/hot-unplug test passed with this patch.
      
      Changes in v3:
      - Drop blk_abort_queue and blk_abort_request
      - Use __blk_end_request_all to complete request dispatched to driver
      
      Changes in v2:
      - Drop req_in_flight
      - Use virtqueue_detach_unused_buf to get request dispatched to driver
      Signed-off-by: NAsias He <asias@redhat.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      b79d866c
  3. 12 4月, 2012 1 次提交
    • R
      virtio_blk: helper function to format disk names · c0aa3e09
      Ren Mingxin 提交于
      The current virtio block's naming algorithm just supports 18278
      (26^3 + 26^2 + 26) disks. If there are more virtio blocks,
      there will be disks with the same name.
      
      Based on commit 3e1a7ff8, add
      a function "virtblk_name_format()" for virtio block to support mass
      of disks naming.
      
      Notes:
      - Our naming scheme is ugly. We are stuck with it
        for virtio but don't use it for any new driver:
        new drivers should name their devices PREFIX%d
        where the sequence number can be allocated by ida
      - sd_format_disk_name has exactly the same logic.
        Moving it to a central place was deferred over worries
        that this will make people keep using the legacy naming
        in new drivers.
        We kept code idential in case someone wants to deduplicate later.
      Signed-off-by: NRen Mingxin <renmx@cn.fujitsu.com>
      Acked-by: NAsias He <asias@redhat.com>
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      c0aa3e09
  4. 29 3月, 2012 1 次提交
  5. 15 1月, 2012 1 次提交
  6. 12 1月, 2012 4 次提交
  7. 02 11月, 2011 1 次提交
  8. 01 11月, 2011 1 次提交
  9. 31 10月, 2011 1 次提交
  10. 30 5月, 2011 2 次提交
  11. 21 10月, 2010 1 次提交
  12. 10 10月, 2010 1 次提交
  13. 10 9月, 2010 3 次提交
    • T
      virtio_blk: drop REQ_HARDBARRIER support · 02c42b7a
      Tejun Heo 提交于
      Remove now unused REQ_HARDBARRIER support.  virtio_blk already
      supports REQ_FLUSH and the usefulness of REQ_FUA for virtio_blk is
      questionable at this point, so there's nothing else to do to support
      new REQ_FLUSH/FUA interface.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      02c42b7a
    • T
      block: deprecate barrier and replace blk_queue_ordered() with blk_queue_flush() · 4913efe4
      Tejun Heo 提交于
      Barrier is deemed too heavy and will soon be replaced by FLUSH/FUA
      requests.  Deprecate barrier.  All REQ_HARDBARRIERs are failed with
      -EOPNOTSUPP and blk_queue_ordered() is replaced with simpler
      blk_queue_flush().
      
      blk_queue_flush() takes combinations of REQ_FLUSH and FUA.  If a
      device has write cache and can flush it, it should set REQ_FLUSH.  If
      the device can handle FUA writes, it should also set REQ_FUA.
      
      All blk_queue_ordered() users are converted.
      
      * ORDERED_DRAIN is mapped to 0 which is the default value.
      * ORDERED_DRAIN_FLUSH is mapped to REQ_FLUSH.
      * ORDERED_DRAIN_FLUSH_FUA is mapped to REQ_FLUSH | REQ_FUA.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NBoaz Harrosh <bharrosh@panasas.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
      Cc: Chris Wright <chrisw@sous-sol.org>
      Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
      Cc: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Alasdair G Kergon <agk@redhat.com>
      Cc: Pierre Ossman <drzeus@drzeus.cx>
      Cc: Stefan Weinhuber <wein@de.ibm.com>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      4913efe4
    • T
      block: kill QUEUE_ORDERED_BY_TAG · 6958f145
      Tejun Heo 提交于
      Nobody is making meaningful use of ORDERED_BY_TAG now and queue
      draining for barrier requests will be removed soon which will render
      the advantage of tag ordering moot.  Kill ORDERED_BY_TAG.  The
      following users are affected.
      
      * brd: converted to ORDERED_DRAIN.
      * virtio_blk: ORDERED_TAG path was already marked deprecated.  Removed.
      * xen-blkfront: ORDERED_TAG case dropped.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
      Cc: Chris Wright <chrisw@sous-sol.org>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      6958f145
  14. 08 8月, 2010 5 次提交
  15. 05 8月, 2010 3 次提交
    • R
      virtio_blk: Remove VBID ioctl · 6c99a852
      Ryan Harper 提交于
      With the availablility of a sysfs device attribute for examining disk serial
      numbers the ioctl is no longer needed.  The user-space changes for this aren't
      upstream yet so we don't have any users to worry about.
      Signed-off-by: NRyan Harper <ryanh@us.ibm.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      6c99a852
    • R
      virtio_blk: Add 'serial' attribute to virtio-blk devices (v2) · a5eb9e4f
      Ryan Harper 提交于
      Create a new attribute for virtio-blk devices that will fetch the serial number
      of the block device.  This attribute can be used by udev to create disk/by-id
      symlinks for devices that don't have a UUID (filesystem) associated with them.
      
      ATA_IDENTIFY strings are special in that they can be up to 20 chars long
      and aren't required to be nul-terminated.  The buffer is also zero-padded
      meaning that if the serial is 19 chars or less that we get a nul-terminated
      string.  When copying this value into a string buffer, we must be careful to
      copy up to the nul (if it present) and only 20 if it is longer and not to
      attempt to nul terminate; this isn't needed.
      
      Changes since v1:
      - Added BUILD_BUG_ON() for PAGE_SIZE check
      - Removed min() since BUILD_BUG_ON() handles the check
      - Replaced serial_sysfs() by copying id directly to buffer
      Signed-off-by: NRyan Harper <ryanh@us.ibm.com>
      Signed-off-by: Njohn cooper <john.cooper@redhat.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      a5eb9e4f
    • C
      virtio_blk: support barriers without FLUSH feature · 10bc310c
      Christoph Hellwig 提交于
      If we want to support barriers with the cache=writethrough mode in qemu
      we need to tell the block layer that we only need queue drains to
      implement a barrier.  Follow the model set by SCSI and IDE and assume
      that there is no volatile write cache if the host doesn't advertize it.
      While this might imply working barriers on old qemu versions or other
      hypervisors that actually have a volatile write cache this is only a
      cosmetic issue - these hypervisors don't guarantee any data integrity
      with or without this patch, but with the patch we at least provide
      data ordering.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      10bc310c
  16. 03 6月, 2010 1 次提交
  17. 19 5月, 2010 4 次提交
  18. 30 3月, 2010 1 次提交
    • T
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo 提交于
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
        file.
      
      The conversion was done in the following steps.
      
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
         files.
      
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
         necessary.
      
      6. percpu.h was updated not to include slab.h.
      
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
      
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      5a0e3ad6
  19. 15 3月, 2010 1 次提交
  20. 24 2月, 2010 1 次提交
  21. 11 1月, 2010 1 次提交
    • M
      block: make virtio device id constant · 47483e25
      Márton Németh 提交于
      The id_table field of the struct virtio_driver is constant in <linux/virtio.h>
      so it is worth to make id_table also constant.
      
      The semantic match that finds this kind of pattern is as follows:
      (http://coccinelle.lip6.fr/)
      
      // <smpl>
      @r@
      disable decl_init,const_decl_init;
      identifier I1, I2, x;
      @@
      	struct I1 {
      	  ...
      	  const struct I2 *x;
      	  ...
      	};
      @s@
      identifier r.I1, y;
      identifier r.x, E;
      @@
      	struct I1 y = {
      	  .x = E,
      	};
      @c@
      identifier r.I2;
      identifier s.E;
      @@
      	const struct I2 E[] = ... ;
      @depends on !c@
      identifier r.I2;
      identifier s.E;
      @@
      +	const
      	struct I2 E[] = ...;
      // </smpl>
      Signed-off-by: NMárton Németh <nm127@freemail.hu>
      Cc: Julia Lawall <julia@diku.dk>
      Cc: cocci@diku.dk
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      47483e25
  22. 22 10月, 2009 1 次提交
    • R
      virtio_blk: Revert serial number support · 3225beab
      Rusty Russell 提交于
      This reverts "Add serial number support for virtio_blk, V4a".
      
      Turns out that virtio_pci, lguest and s/390 all have an 8 bit limit
      on virtio config space, so noone could ever use this.
      
      This is coming back later in a cleaner form.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Cc: john cooper <john.cooper@redhat.com>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      3225beab