1. 30 6月, 2016 1 次提交
    • B
      xen-blkfront: save uncompleted reqs in blkfront_resume() · 7b427a59
      Bob Liu 提交于
      Uncompleted reqs used to be 'saved and resubmitted' in blkfront_recover() during
      migration, but that's too late after multi-queue was introduced.
      
      After a migrate to another host (which may not have multiqueue support), the
      number of rings (block hardware queues) may be changed and the ring and shadow
      structure will also be reallocated.
      
      The blkfront_recover() then can't 'save and resubmit' the real
      uncompleted reqs because shadow structure have been reallocated.
      
      This patch fixes this issue by moving the 'save' logic out of
      blkfront_recover() to earlier place in blkfront_resume().
      
      The 'resubmit' is not changed and still in blkfront_recover().
      Signed-off-by: NBob Liu <bob.liu@oracle.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: stable@vger.kernel.org
      7b427a59
  2. 09 6月, 2016 2 次提交
    • B
      xen-blkfront: fix resume issues after a migration · 2a6f71ad
      Bob Liu 提交于
      After a migrate to another host (which may not have multiqueue
      support), the number of rings (block hardware queues)
      may be changed and the ring info structure will also be reallocated.
      
      This patch fixes two related bugs:
       * call blk_mq_update_nr_hw_queues() to make blk-core know the number
         of hardware queues have been changed.
       * Don't store rinfo pointer to hctx->driver_data, because rinfo may be
         reallocated so use hctx->queue_num to get the rinfo structure instead.
      Signed-off-by: NBob Liu <bob.liu@oracle.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      2a6f71ad
    • B
      xen-blkfront: don't call talk_to_blkback when already connected to blkback · efd15352
      Bob Liu 提交于
      Sometimes blkfront may twice receive blkback_changed() notification
      (XenbusStateConnected) after migration, which will cause
      talk_to_blkback() to be called twice too and confuse xen-blkback.
      
      The flow is as follow:
         blkfront                                        blkback
      blkfront_resume()
       > talk_to_blkback()
        > Set blkfront to XenbusStateInitialised
                                                      front changed()
                                                       > Connect()
                                                        > Set blkback to XenbusStateConnected
      
      blkback_changed()
       > Skip talk_to_blkback()
         because frontstate == XenbusStateInitialised
       > blkfront_connect()
        > Set blkfront to XenbusStateConnected
      
      -----
      And here we get another XenbusStateConnected notification leading
      to:
      -----
      blkback_changed()
       > because now frontstate != XenbusStateInitialised
         talk_to_blkback() is also called again
        > blkfront state changed from
        XenbusStateConnected to XenbusStateInitialised
          (Which is not correct!)
      
      						front_changed():
                                                       > Do nothing because blkback
                                                         already in XenbusStateConnected
      
      Now blkback is in XenbusStateConnected but blkfront is still
      in XenbusStateInitialised - leading to no disks.
      
      Poking of the XenbusStateConnected state is allowed (to deal with
      block disk change) and has to be dealt with. The most likely
      cause of this bug are custom udev scripts hooking up the disks
      and then validating the size.
      Signed-off-by: NBob Liu <bob.liu@oracle.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      efd15352
  3. 13 4月, 2016 1 次提交
  4. 04 3月, 2016 1 次提交
  5. 30 1月, 2016 1 次提交
  6. 05 1月, 2016 11 次提交
    • K
      xen/blkfront: Fix crash if backend doesn't follow the right states. · c31ecf6c
      Konrad Rzeszutek Wilk 提交于
      We have split the setting up of all the resources in two steps:
      1) talk_to_blkback  - which figures out the num_ring_pages (from
         the default value of zero), sets up shadow and so
      2) blkfront_connect - does the real part of filling out the
         internal structures.
      
      The problem is if we bypass the 1) step and go straight to 2)
      and call blkfront_setup_indirect where we use the macro
      BLK_RING_SIZE - which returns an negative value (because
      sz is zero  - since num_ring_pages is zero - since it has never
      been set).
      
      We can fix this by making sure that we always have called
      talk_to_blkback before going to blkfront_connect.
      
      Or we could set in blkfront_probe info->nr_ring_pages = 1
      to have a default value. But that looks odd - as we haven't
      actually negotiated any ring size.
      
      This patch changes XenbusStateConnected state to detect if
      we haven't done the initial handshake - and if so continue
      on as if were in XenbusStateInitWait state.
      
      We also roll the error recovery (freeing the structure) into
      talk_to_blkback error path - which is safe since that function
      is only called from blkback_changed.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      c31ecf6c
    • J
      xen/blkfront: Handle non-indirect grant with 64KB pages · 6cc56833
      Julien Grall 提交于
      The minimal size of request in the block framework is always PAGE_SIZE.
      It means that when 64KB guest is support, the request will at least be
      64KB.
      
      Although, if the backend doesn't support indirect descriptor (such as QDISK
      in QEMU), a ring request is only able to accommodate 11 segments of 4KB
      (i.e 44KB).
      
      The current frontend is assuming that an I/O request will always fit in
      a ring request. This is not true any more when using 64KB page
      granularity and will therefore crash during boot.
      
      On ARM64, the ABI is completely neutral to the page granularity used by
      the domU. The guest has the choice between different page granularity
      supported by the processors (for instance on ARM64: 4KB, 16KB, 64KB).
      This can't be enforced by the hypervisor and therefore it's possible to
      run guests using different page granularity.
      
      So we can't mandate the block backend to support indirect descriptor
      when the frontend is using 64KB page granularity and have to fix it
      properly in the frontend.
      
      The solution exposed below is based on modifying directly the frontend
      guest rather than asking the block framework to support smaller size
      (i.e < PAGE_SIZE). This is because the change is the block framework are
      not trivial as everything seems to relying on a struct *page (see [1]).
      Although, it may be possible that someone succeed to do it in the future
      and we would therefore be able to use it.
      
      Given that a block request may not fit in a single ring request, a
      second request is introduced for the data that cannot fit in the first
      one. This means that the second ring request should never be used on
      Linux if the page size is smaller than 44KB.
      
      To achieve the support of the extra ring request, the block queue size
      is divided by two. Therefore, the ring will always contain enough space
      to accommodate 2 ring requests. While this will reduce the overall
      performance, it will make the implementation more contained. The way
      forward to get better performance is to implement in the backend either
      indirect descriptor or multiple grants ring.
      
      Note that the parameters blk_queue_max_* helpers haven't been updated.
      The block code will set the mimimum size supported and we may be able
      to support directly any change in the block framework that lower down
      the minimal size of a request.
      
      [1] http://lists.xen.org/archives/html/xen-devel/2015-08/msg02200.htmlSigned-off-by: NJulien Grall <julien.grall@citrix.com>
      Acked-by: NRoger Pau Monné <roger.pau@citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      6cc56833
    • J
      xen-blkfront: Introduce blkif_ring_get_request · 2e073969
      Julien Grall 提交于
      The code to get a request is always the same. Therefore we can factorize
      it in a single function.
      Signed-off-by: NJulien Grall <julien.grall@citrix.com>
      Acked-by: NRoger Pau Monné <roger.pau@citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      2e073969
    • K
      xen/blocks: Return -EXX instead of -1 · bde21f73
      Konrad Rzeszutek Wilk 提交于
      Lets return sensible values instead of -1.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      bde21f73
    • P
      xen/blkfront: correct setting for xen_blkif_max_ring_order · 45fc8264
      Peng Fan 提交于
      According to this piece code:
      "
           pr_info("Invalid max_ring_order (%d), will use default max: %d.\n",
                    xen_blkif_max_ring_order, XENBUS_MAX_RING_GRANT_ORDER);
      "
      if xen_blkif_max_ring_order is bigger that XENBUS_MAX_RING_GRANT_ORDER,
      need to set xen_blkif_max_ring_order using XENBUS_MAX_RING_GRANT_ORDER,
      but not 0.
      Signed-off-by: NPeng Fan <van.freenix@gmail.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Cc: "Roger Pau Monné" <roger.pau@citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      45fc8264
    • B
      xen/blkfront: make persistent grants pool per-queue · 73716df7
      Bob Liu 提交于
      Make persistent grants per-queue/ring instead of per-device, so that we can
      drop the 'dev_lock' and get better scalability.
      
      Test was done based on null_blk driver:
      dom0: v4.2-rc8 16vcpus 10GB "modprobe null_blk"
      domu: v4.2-rc8 16vcpus 10GB
      
      [test]
      rw=read
      direct=1
      ioengine=libaio
      bs=4k
      time_based
      runtime=30
      filename=/dev/xvdb
      numjobs=16
      iodepth=64
      iodepth_batch=64
      iodepth_batch_complete=64
      group_reporting
      
      Queues:			  1 	   4 	  	  8 	 	 16
      Iops orig(k):		810 	1064 		780 		700
      Iops patched(k):	810     1230(~20%)	1024(~20%)	850(~20%)
      Signed-off-by: NBob Liu <bob.liu@oracle.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      73716df7
    • B
      xen/blkfront: Remove duplicate setting of ->xbdev. · 75f070b3
      Bob Liu 提交于
      We do the same exact operations a bit earlier in the
      function.
      Signed-off-by: NBob Liu <bob.liu@oracle.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      75f070b3
    • K
    • B
      xen/blkfront: negotiate number of queues/rings to be used with backend · 28d949bc
      Bob Liu 提交于
      The max number of hardware queues for xen/blkfront is set by parameter
      'max_queues'(default 4), while it is also capped by the max value that the
      xen/blkback exposes through XenStore key 'multi-queue-max-queues'.
      
      The negotiated number is the smaller one and would be written back to xenstore
      as "multi-queue-num-queues", blkback needs to read this negotiated number.
      Signed-off-by: NBob Liu <bob.liu@oracle.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      28d949bc
    • B
      xen/blkfront: split per device io_lock · 11659569
      Bob Liu 提交于
      After patch "xen/blkfront: separate per ring information out of device
      info", per-ring data is protected by a per-device lock ('io_lock').
      
      This is not a good way and will effect the scalability, so introduce a
      per-ring lock ('ring_lock').
      
      The old 'io_lock' is renamed to 'dev_lock' which protects the ->grants list and
      ->persistent_gnts_c which are shared by all rings.
      
      Note that in 'blkfront_probe' the 'blkfront_info' is setup via kzalloc
      so setting ->persistent_gnts_c to zero is not needed.
      Signed-off-by: NBob Liu <bob.liu@oracle.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      11659569
    • B
      xen/blkfront: pseudo support for multi hardware queues/rings · 3df0e505
      Bob Liu 提交于
      Preparatory patch for multiple hardware queues (rings). The number of
      rings is unconditionally set to 1, larger number will be enabled in
      patch "xen/blkfront: negotiate number of queues/rings to be used with backend"
      so as to make review easier.
      
      Note that blkfront_gather_backend_features does not call
      blkfront_setup_indirect anymore (as that needs to be done per ring).
      That means that in blkif_recover/blkif_connect we have to do it in a loop
      (bounded by nr_rings).
      Signed-off-by: NBob Liu <bob.liu@oracle.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      3df0e505
  7. 04 1月, 2016 1 次提交
  8. 23 10月, 2015 5 次提交
  9. 08 10月, 2015 1 次提交
  10. 01 10月, 2015 1 次提交
  11. 09 9月, 2015 1 次提交
  12. 20 8月, 2015 1 次提交
  13. 29 7月, 2015 1 次提交
    • C
      block: add a bi_error field to struct bio · 4246a0b6
      Christoph Hellwig 提交于
      Currently we have two different ways to signal an I/O error on a BIO:
      
       (1) by clearing the BIO_UPTODATE flag
       (2) by returning a Linux errno value to the bi_end_io callback
      
      The first one has the drawback of only communicating a single possible
      error (-EIO), and the second one has the drawback of not beeing persistent
      when bios are queued up, and are not passed along from child to parent
      bio in the ever more popular chaining scenario.  Having both mechanisms
      available has the additional drawback of utterly confusing driver authors
      and introducing bugs where various I/O submitters only deal with one of
      them, and the others have to add boilerplate code to deal with both kinds
      of error returns.
      
      So add a new bi_error field to store an errno value directly in struct
      bio and remove the existing mechanisms to clean all this up.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NHannes Reinecke <hare@suse.de>
      Reviewed-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      4246a0b6
  14. 24 7月, 2015 2 次提交
  15. 22 6月, 2015 1 次提交
    • B
      drivers: xen-blkfront: only talk_to_blkback() when in XenbusStateInitialising · a9b54bb9
      Bob Liu 提交于
      Patch 69b91ede
      "drivers: xen-blkback: delay pending_req allocation to connect_ring"
      exposed an problem that Xen blkfront has. There is a race
      with XenStored and the drivers such that we can see two:
      
      vbd vbd-268440320: blkfront:blkback_changed to state 2.
      vbd vbd-268440320: blkfront:blkback_changed to state 2.
      vbd vbd-268440320: blkfront:blkback_changed to state 4.
      
      state changes to XenbusStateInitWait ('2'). The end result is that
      blkback_changed() receives two notify and calls twice setup_blkring().
      
      While the backend driver may only get the first setup_blkring() which is
      wrong and reads out-dated (or reads them as they are being updated
      with new ring-ref values).
      
      The end result is that the ring ends up being incorrectly set.
      
      The other drivers in the tree have such checks already in.
      Reported-and-Tested-by: NRobert Butera <robert.butera@oracle.com>
      Signed-off-by: NBob Liu <bob.liu@oracle.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      a9b54bb9
  16. 17 6月, 2015 2 次提交
  17. 06 6月, 2015 2 次提交
    • B
      xen/block: add multi-page ring support · 86839c56
      Bob Liu 提交于
      Extend xen/block to support multi-page ring, so that more requests can be
      issued by using more than one pages as the request ring between blkfront
      and backend.
      As a result, the performance can get improved significantly.
      
      We got some impressive improvements on our highend iscsi storage cluster
      backend. If using 64 pages as the ring, the IOPS increased about 15 times
      for the throughput testing and above doubled for the latency testing.
      
      The reason was the limit on outstanding requests is 32 if use only one-page
      ring, but in our case the iscsi lun was spread across about 100 physical
      drives, 32 was really not enough to keep them busy.
      
      Changes in v2:
       - Rebased to 4.0-rc6.
       - Document on how multi-page ring feature working to linux io/blkif.h.
      
      Changes in v3:
       - Remove changes to linux io/blkif.h and follow the protocol defined
         in io/blkif.h of XEN tree.
       - Rebased to 4.1-rc3
      
      Changes in v4:
       - Turn to use 'ring-page-order' and 'max-ring-page-order'.
       - A few comments from Roger.
      
      Changes in v5:
       - Clarify with 4k granularity to comment
       - Address more comments from Roger
      Signed-off-by: NBob Liu <bob.liu@oracle.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      86839c56
    • B
      driver: xen-blkfront: move talk_to_blkback to a more suitable place · 8ab0144a
      Bob Liu 提交于
      The major responsibility of talk_to_blkback() is allocate and initialize
      the request ring and write the ring info to xenstore.
      But this work should be done after backend entered 'XenbusStateInitWait' as
      defined in the protocol file.
      See xen/include/public/io/blkif.h in XEN git tree:
      Front                                Back
      =================================    =====================================
      XenbusStateInitialising              XenbusStateInitialising
       o Query virtual device               o Query backend device identification
         properties.                          data.
       o Setup OS device instance.          o Open and validate backend device.
                                            o Publish backend features and
                                              transport parameters.
                                                           |
                                                           |
                                                           V
                                           XenbusStateInitWait
      
      o Query backend features and
        transport parameters.
      o Allocate and initialize the
        request ring.
      
      There is no problem with this yet, but it is an violation of the design and
      furthermore it would not allow frontend/backend to negotiate 'multi-page'
      and 'multi-queue' features.
      
      Changes in v2:
       - Re-write the commit message to be more clear.
      Signed-off-by: NBob Liu <bob.liu@oracle.com>
      Acked-by: NRoger Pau Monné <roger.pau@citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      8ab0144a
  18. 15 4月, 2015 1 次提交
  19. 13 2月, 2015 1 次提交
  20. 11 2月, 2015 1 次提交
  21. 11 12月, 2014 2 次提交