1. 22 6月, 2015 1 次提交
    • B
      drivers: xen-blkfront: only talk_to_blkback() when in XenbusStateInitialising · a9b54bb9
      Bob Liu 提交于
      Patch 69b91ede
      "drivers: xen-blkback: delay pending_req allocation to connect_ring"
      exposed an problem that Xen blkfront has. There is a race
      with XenStored and the drivers such that we can see two:
      
      vbd vbd-268440320: blkfront:blkback_changed to state 2.
      vbd vbd-268440320: blkfront:blkback_changed to state 2.
      vbd vbd-268440320: blkfront:blkback_changed to state 4.
      
      state changes to XenbusStateInitWait ('2'). The end result is that
      blkback_changed() receives two notify and calls twice setup_blkring().
      
      While the backend driver may only get the first setup_blkring() which is
      wrong and reads out-dated (or reads them as they are being updated
      with new ring-ref values).
      
      The end result is that the ring ends up being incorrectly set.
      
      The other drivers in the tree have such checks already in.
      Reported-and-Tested-by: NRobert Butera <robert.butera@oracle.com>
      Signed-off-by: NBob Liu <bob.liu@oracle.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      a9b54bb9
  2. 06 6月, 2015 8 次提交
    • B
      xen/block: add multi-page ring support · 86839c56
      Bob Liu 提交于
      Extend xen/block to support multi-page ring, so that more requests can be
      issued by using more than one pages as the request ring between blkfront
      and backend.
      As a result, the performance can get improved significantly.
      
      We got some impressive improvements on our highend iscsi storage cluster
      backend. If using 64 pages as the ring, the IOPS increased about 15 times
      for the throughput testing and above doubled for the latency testing.
      
      The reason was the limit on outstanding requests is 32 if use only one-page
      ring, but in our case the iscsi lun was spread across about 100 physical
      drives, 32 was really not enough to keep them busy.
      
      Changes in v2:
       - Rebased to 4.0-rc6.
       - Document on how multi-page ring feature working to linux io/blkif.h.
      
      Changes in v3:
       - Remove changes to linux io/blkif.h and follow the protocol defined
         in io/blkif.h of XEN tree.
       - Rebased to 4.1-rc3
      
      Changes in v4:
       - Turn to use 'ring-page-order' and 'max-ring-page-order'.
       - A few comments from Roger.
      
      Changes in v5:
       - Clarify with 4k granularity to comment
       - Address more comments from Roger
      Signed-off-by: NBob Liu <bob.liu@oracle.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      86839c56
    • B
      driver: xen-blkfront: move talk_to_blkback to a more suitable place · 8ab0144a
      Bob Liu 提交于
      The major responsibility of talk_to_blkback() is allocate and initialize
      the request ring and write the ring info to xenstore.
      But this work should be done after backend entered 'XenbusStateInitWait' as
      defined in the protocol file.
      See xen/include/public/io/blkif.h in XEN git tree:
      Front                                Back
      =================================    =====================================
      XenbusStateInitialising              XenbusStateInitialising
       o Query virtual device               o Query backend device identification
         properties.                          data.
       o Setup OS device instance.          o Open and validate backend device.
                                            o Publish backend features and
                                              transport parameters.
                                                           |
                                                           |
                                                           V
                                           XenbusStateInitWait
      
      o Query backend features and
        transport parameters.
      o Allocate and initialize the
        request ring.
      
      There is no problem with this yet, but it is an violation of the design and
      furthermore it would not allow frontend/backend to negotiate 'multi-page'
      and 'multi-queue' features.
      
      Changes in v2:
       - Re-write the commit message to be more clear.
      Signed-off-by: NBob Liu <bob.liu@oracle.com>
      Acked-by: NRoger Pau Monné <roger.pau@citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      8ab0144a
    • B
      drivers: xen-blkback: delay pending_req allocation to connect_ring · 69b91ede
      Bob Liu 提交于
      This is a pre-patch for multi-page ring feature.
      In connect_ring, we can know exactly how many pages are used for the shared
      ring, delay pending_req allocation here so that we won't waste too much memory.
      Signed-off-by: NBob Liu <bob.liu@oracle.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      69b91ede
    • K
      NVMe: Automatic namespace rescan · a5768aa8
      Keith Busch 提交于
      Namespaces may be dynamically allocated and deleted or attached and
      detached. This has the driver rescan the device for namespace changes
      after each device reset or namespace change asynchronous event.
      
      There could potentially be many detached namespaces that we don't want
      polluting /dev/ with unusable block handles, so this will delete disks
      if the namespace is not active as indicated by the response from identify
      namespace. This also skips adding the disk if no capacity is provisioned
      to the namespace in the first place.
      Signed-off-by: NKeith Busch <keith.busch@intel.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      a5768aa8
    • J
      Merge branch 'for-4.2/core' into for-4.2/drivers · b281ebb8
      Jens Axboe 提交于
      b281ebb8
    • J
      block: add blk_set_queue_dying() to blkdev.h · 3f21c265
      Jens Axboe 提交于
      We export this function and NVMe wants to use it, but for some reason
      it was never added to the block header. Do that.
      Signed-off-by: NJens Axboe <axboe@fb.com>
      3f21c265
    • J
      NVMe: Memory barrier before queue_count is incremented · 36a7e993
      Jon Derrick 提交于
      Protects against reordering and/or preempting which would allow the
      kthread to access the queue descriptor before it is set up
      Signed-off-by: NJon Derrick <jonathan.derrick@intel.com>
      Acked-by: NKeith Busch <keith.busch@intel.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      36a7e993
    • K
      NVMe: add sysfs and ioctl controller reset · 4cc06521
      Keith Busch 提交于
      We need the ability to perform an nvme controller reset as discussed on
      the mailing list thread:
      
        http://lists.infradead.org/pipermail/linux-nvme/2015-March/001585.html
      
      This adds a sysfs entry that when written to will reset perform an NVMe
      controller reset if the controller was successfully initialized in the
      first place.
      
      This also adds locking around resetting the device in the async probe
      method so the driver can't schedule two resets.
      Signed-off-by: NKeith Busch <keith.busch@intel.com>
      Cc: Brandon Schultz <brandon.schulz@hgst.com>
      Cc: David Sariel <david.sariel@pmcs.com>
      
      Updated by Jens to:
      
      1) Merge this with the ioctl reset patch from David Sariel. The ioctl
         path now shares the reset code from the sysfs path.
      
      2) Don't flush work if we fail issuing the reset.
      Signed-off-by: NJens Axboe <axboe@fb.com>
      4cc06521
  3. 02 6月, 2015 5 次提交
    • A
      null_blk: restart request processing on completion handler · 8b70f45e
      Akinobu Mita 提交于
      When irqmode=2 (IRQ completion handler is timer) and queue_mode=1
      (Block interface to use is rq), the completion handler should restart
      request handling for any pending requests on a queue because request
      processing stops when the number of commands are queued more than
      hw_queue_depth (null_rq_prep_fn returns BLKPREP_DEFER).
      
      Without this change, the following command cannot finish.
      
      	# modprobe null_blk irqmode=2 queue_mode=1 hw_queue_depth=1
      	# fio --name=t --rw=read --size=1g --direct=1 \
      	  --ioengine=libaio --iodepth=64 --filename=/dev/nullb0
      Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
      Cc: Jens Axboe <axboe@fb.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      8b70f45e
    • A
      null_blk: prevent timer handler running on a different CPU where started · 419c21a3
      Akinobu Mita 提交于
      When irqmode=2 (IRQ completion handler is timer), timer handler should
      be called on the same CPU where the timer has been started.
      
      Since completion_queues are per-cpu and the completion handler only
      touches completion_queue for local CPU, we need to prevent the handler
      from running on a different CPU where the timer has been started.
      Otherwise, the IO cannot be completed until another completion handler
      is executed on that CPU.
      Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
      Cc: Jens Axboe <axboe@fb.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      419c21a3
    • K
      NVMe: Remove hctx reliance for multi-namespace · 42483228
      Keith Busch 提交于
      The driver needs to track shared tags to support multiple namespaces
      that may be dynamically allocated or deleted. Relying on the first
      request_queue's hctx's is not appropriate as we cannot clear outstanding
      tags for all namespaces using this handle, nor can the driver easily track
      all request_queue's hctx as namespaces are attached/detached. Instead,
      this patch uses the nvme_dev's tagset to get the shared tag resources
      instead of through a request_queue hctx.
      Signed-off-by: NKeith Busch <keith.busch@intel.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      42483228
    • J
      Merge branch 'for-4.2/core' into for-4.2/drivers · 843e8ddb
      Jens Axboe 提交于
      843e8ddb
    • K
      blk-mq: Shared tag enhancements · f26cdc85
      Keith Busch 提交于
      Storage controllers may expose multiple block devices that share hardware
      resources managed by blk-mq. This patch enhances the shared tags so a
      low-level driver can access the shared resources not tied to the unshared
      h/w contexts. This way the LLD can dynamically add and delete disks and
      request queues without having to track all the request_queue hctx's to
      iterate outstanding tags.
      Signed-off-by: NKeith Busch <keith.busch@intel.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      f26cdc85
  4. 30 5月, 2015 4 次提交
  5. 27 5月, 2015 1 次提交
  6. 23 5月, 2015 1 次提交
  7. 22 5月, 2015 11 次提交
  8. 20 5月, 2015 9 次提交