1. 24 7月, 2017 1 次提交
    • B
      xen-blkfront: Fix handling of non-supported operations · 31c4ccc3
      Bart Van Assche 提交于
      This patch fixes the following sparse warnings:
      
      drivers/block/xen-blkfront.c:916:45: warning: incorrect type in argument 2 (different base types)
      drivers/block/xen-blkfront.c:916:45:    expected restricted blk_status_t [usertype] error
      drivers/block/xen-blkfront.c:916:45:    got int [signed] error
      drivers/block/xen-blkfront.c:1599:47: warning: incorrect type in assignment (different base types)
      drivers/block/xen-blkfront.c:1599:47:    expected int [signed] error
      drivers/block/xen-blkfront.c:1599:47:    got restricted blk_status_t [usertype] <noident>
      drivers/block/xen-blkfront.c:1607:55: warning: incorrect type in assignment (different base types)
      drivers/block/xen-blkfront.c:1607:55:    expected int [signed] error
      drivers/block/xen-blkfront.c:1607:55:    got restricted blk_status_t [usertype] <noident>
      drivers/block/xen-blkfront.c:1625:55: warning: incorrect type in assignment (different base types)
      drivers/block/xen-blkfront.c:1625:55:    expected int [signed] error
      drivers/block/xen-blkfront.c:1625:55:    got restricted blk_status_t [usertype] <noident>
      drivers/block/xen-blkfront.c:1628:62: warning: restricted blk_status_t degrades to integer
      
      Compile-tested only.
      
      Fixes: commit 2a842aca ("block: introduce new block status code type")
      Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Roger Pau Monné <roger.pau@citrix.com>
      Cc: <xen-devel@lists.xenproject.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      31c4ccc3
  2. 28 6月, 2017 1 次提交
  3. 19 6月, 2017 1 次提交
  4. 09 6月, 2017 3 次提交
    • C
      block: switch bios to blk_status_t · 4e4cbee9
      Christoph Hellwig 提交于
      Replace bi_error with a new bi_status to allow for a clear conversion.
      Note that device mapper overloaded bi_error with a private value, which
      we'll have to keep arround at least for now and thus propagate to a
      proper blk_status_t value.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      4e4cbee9
    • C
      blk-mq: switch ->queue_rq return value to blk_status_t · fc17b653
      Christoph Hellwig 提交于
      Use the same values for use for request completion errors as the return
      value from ->queue_rq.  BLK_STS_RESOURCE is special cased to cause
      a requeue, and all the others are completed as-is.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      fc17b653
    • C
      block: introduce new block status code type · 2a842aca
      Christoph Hellwig 提交于
      Currently we use nornal Linux errno values in the block layer, and while
      we accept any error a few have overloaded magic meanings.  This patch
      instead introduces a new  blk_status_t value that holds block layer specific
      status codes and explicitly explains their meaning.  Helpers to convert from
      and to the previous special meanings are provided for now, but I suspect
      we want to get rid of them in the long run - those drivers that have a
      errno input (e.g. networking) usually get errnos that don't know about
      the special block layer overloads, and similarly returning them to userspace
      will usually return somethings that strictly speaking isn't correct
      for file system operations, but that's left as an exercise for later.
      
      For now the set of errors is a very limited set that closely corresponds
      to the previous overloaded errno values, but there is some low hanging
      fruite to improve it.
      
      blk_status_t (ab)uses the sparse __bitwise annotations to allow for sparse
      typechecking, so that we can easily catch places passing the wrong values.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      2a842aca
  5. 21 4月, 2017 2 次提交
  6. 18 4月, 2017 1 次提交
  7. 31 3月, 2017 1 次提交
  8. 01 2月, 2017 1 次提交
    • C
      block: fold cmd_type into the REQ_OP_ space · aebf526b
      Christoph Hellwig 提交于
      Instead of keeping two levels of indirection for requests types, fold it
      all into the operations.  The little caveat here is that previously
      cmd_type only applied to struct request, while the request and bio op
      fields were set to plain REQ_OP_READ/WRITE even for passthrough
      operations.
      
      Instead this patch adds new REQ_OP_* for SCSI passthrough and driver
      private requests, althought it has to add two for each so that we
      can communicate the data in/out nature of the request.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      aebf526b
  9. 24 1月, 2017 2 次提交
  10. 07 11月, 2016 1 次提交
  11. 03 11月, 2016 2 次提交
    • B
      blk-mq: Add a kick_requeue_list argument to blk_mq_requeue_request() · 2b053aca
      Bart Van Assche 提交于
      Most blk_mq_requeue_request() and blk_mq_add_to_requeue_list() calls
      are followed by kicking the requeue list. Hence add an argument to
      these two functions that allows to kick the requeue list. This was
      proposed by Christoph Hellwig.
      Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com>
      Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Cc: Hannes Reinecke <hare@suse.com>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      2b053aca
    • B
      blk-mq: Avoid that requeueing starts stopped queues · 52d7f1b5
      Bart Van Assche 提交于
      Since blk_mq_requeue_work() starts stopped queues and since
      execution of this function can be scheduled after a queue has
      been stopped it is not possible to stop queues without using
      an additional state variable to track whether or not the queue
      has been stopped. Hence modify blk_mq_requeue_work() such that it
      does not start stopped queues. My conclusion after a review of
      the blk_mq_stop_hw_queues() and blk_mq_{delay_,}kick_requeue_list()
      callers is as follows:
      * In the dm driver starting and stopping queues should only happen
        if __dm_suspend() or __dm_resume() is called and not if the
        requeue list is processed.
      * In the SCSI core queue stopping and starting should only be
        performed by the scsi_internal_device_block() and
        scsi_internal_device_unblock() functions but not by any other
        function. Although the blk_mq_stop_hw_queue() call in
        scsi_queue_rq() may help to reduce CPU load if a LLD queue is
        full, figuring out whether or not a queue should be restarted
        when requeueing a command would require to introduce additional
        locking in scsi_mq_requeue_cmd() to avoid a race with
        scsi_internal_device_block(). Avoid this complexity by removing
        the blk_mq_stop_hw_queue() call from scsi_queue_rq().
      * In the NVMe core only the functions that call
        blk_mq_start_stopped_hw_queues() explicitly should start stopped
        queues.
      * A blk_mq_start_stopped_hwqueues() call must be added in the
        xen-blkfront driver in its blkif_recover() function.
      Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Roger Pau Monné <roger.pau@citrix.com>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Cc: James Bottomley <jejb@linux.vnet.ibm.com>
      Cc: Martin K. Petersen <martin.petersen@oracle.com>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      52d7f1b5
  12. 15 9月, 2016 1 次提交
  13. 20 8月, 2016 3 次提交
  14. 22 7月, 2016 1 次提交
  15. 30 6月, 2016 1 次提交
    • B
      xen-blkfront: save uncompleted reqs in blkfront_resume() · 7b427a59
      Bob Liu 提交于
      Uncompleted reqs used to be 'saved and resubmitted' in blkfront_recover() during
      migration, but that's too late after multi-queue was introduced.
      
      After a migrate to another host (which may not have multiqueue support), the
      number of rings (block hardware queues) may be changed and the ring and shadow
      structure will also be reallocated.
      
      The blkfront_recover() then can't 'save and resubmit' the real
      uncompleted reqs because shadow structure have been reallocated.
      
      This patch fixes this issue by moving the 'save' logic out of
      blkfront_recover() to earlier place in blkfront_resume().
      
      The 'resubmit' is not changed and still in blkfront_recover().
      Signed-off-by: NBob Liu <bob.liu@oracle.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: stable@vger.kernel.org
      7b427a59
  16. 28 6月, 2016 1 次提交
    • D
      block: convert to device_add_disk() · 0d52c756
      Dan Williams 提交于
      For block drivers that specify a parent device, convert them to use
      device_add_disk().
      
      This conversion was done with the following semantic patch:
      
          @@
          struct gendisk *disk;
          expression E;
          @@
      
          - disk->driverfs_dev = E;
          ...
          - add_disk(disk);
          + device_add_disk(E, disk);
      
          @@
          struct gendisk *disk;
          expression E1, E2;
          @@
      
          - disk->driverfs_dev = E1;
          ...
          E2 = disk;
          ...
          - add_disk(E2);
          + device_add_disk(E1, E2);
      
      ...plus some manual fixups for a few missed conversions.
      
      Cc: Jens Axboe <axboe@fb.com>
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: James Bottomley <James.Bottomley@hansenpartnership.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Martin K. Petersen <martin.petersen@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      0d52c756
  17. 09 6月, 2016 3 次提交
    • C
      block: add a separate operation type for secure erase · 288dab8a
      Christoph Hellwig 提交于
      Instead of overloading the discard support with the REQ_SECURE flag.
      Use the opportunity to rename the queue flag as well, and remove the
      dead checks for this flag in the RAID 1 and RAID 10 drivers that don't
      claim support for secure erase.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      288dab8a
    • B
      xen-blkfront: fix resume issues after a migration · 2a6f71ad
      Bob Liu 提交于
      After a migrate to another host (which may not have multiqueue
      support), the number of rings (block hardware queues)
      may be changed and the ring info structure will also be reallocated.
      
      This patch fixes two related bugs:
       * call blk_mq_update_nr_hw_queues() to make blk-core know the number
         of hardware queues have been changed.
       * Don't store rinfo pointer to hctx->driver_data, because rinfo may be
         reallocated so use hctx->queue_num to get the rinfo structure instead.
      Signed-off-by: NBob Liu <bob.liu@oracle.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      2a6f71ad
    • B
      xen-blkfront: don't call talk_to_blkback when already connected to blkback · efd15352
      Bob Liu 提交于
      Sometimes blkfront may twice receive blkback_changed() notification
      (XenbusStateConnected) after migration, which will cause
      talk_to_blkback() to be called twice too and confuse xen-blkback.
      
      The flow is as follow:
         blkfront                                        blkback
      blkfront_resume()
       > talk_to_blkback()
        > Set blkfront to XenbusStateInitialised
                                                      front changed()
                                                       > Connect()
                                                        > Set blkback to XenbusStateConnected
      
      blkback_changed()
       > Skip talk_to_blkback()
         because frontstate == XenbusStateInitialised
       > blkfront_connect()
        > Set blkfront to XenbusStateConnected
      
      -----
      And here we get another XenbusStateConnected notification leading
      to:
      -----
      blkback_changed()
       > because now frontstate != XenbusStateInitialised
         talk_to_blkback() is also called again
        > blkfront state changed from
        XenbusStateConnected to XenbusStateInitialised
          (Which is not correct!)
      
      						front_changed():
                                                       > Do nothing because blkback
                                                         already in XenbusStateConnected
      
      Now blkback is in XenbusStateConnected but blkfront is still
      in XenbusStateInitialised - leading to no disks.
      
      Poking of the XenbusStateConnected state is allowed (to deal with
      block disk change) and has to be dealt with. The most likely
      cause of this bug are custom udev scripts hooking up the disks
      and then validating the size.
      Signed-off-by: NBob Liu <bob.liu@oracle.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      efd15352
  18. 08 6月, 2016 4 次提交
  19. 13 4月, 2016 1 次提交
  20. 04 3月, 2016 1 次提交
  21. 30 1月, 2016 1 次提交
  22. 05 1月, 2016 7 次提交
    • K
      xen/blkfront: Fix crash if backend doesn't follow the right states. · c31ecf6c
      Konrad Rzeszutek Wilk 提交于
      We have split the setting up of all the resources in two steps:
      1) talk_to_blkback  - which figures out the num_ring_pages (from
         the default value of zero), sets up shadow and so
      2) blkfront_connect - does the real part of filling out the
         internal structures.
      
      The problem is if we bypass the 1) step and go straight to 2)
      and call blkfront_setup_indirect where we use the macro
      BLK_RING_SIZE - which returns an negative value (because
      sz is zero  - since num_ring_pages is zero - since it has never
      been set).
      
      We can fix this by making sure that we always have called
      talk_to_blkback before going to blkfront_connect.
      
      Or we could set in blkfront_probe info->nr_ring_pages = 1
      to have a default value. But that looks odd - as we haven't
      actually negotiated any ring size.
      
      This patch changes XenbusStateConnected state to detect if
      we haven't done the initial handshake - and if so continue
      on as if were in XenbusStateInitWait state.
      
      We also roll the error recovery (freeing the structure) into
      talk_to_blkback error path - which is safe since that function
      is only called from blkback_changed.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      c31ecf6c
    • J
      xen/blkfront: Handle non-indirect grant with 64KB pages · 6cc56833
      Julien Grall 提交于
      The minimal size of request in the block framework is always PAGE_SIZE.
      It means that when 64KB guest is support, the request will at least be
      64KB.
      
      Although, if the backend doesn't support indirect descriptor (such as QDISK
      in QEMU), a ring request is only able to accommodate 11 segments of 4KB
      (i.e 44KB).
      
      The current frontend is assuming that an I/O request will always fit in
      a ring request. This is not true any more when using 64KB page
      granularity and will therefore crash during boot.
      
      On ARM64, the ABI is completely neutral to the page granularity used by
      the domU. The guest has the choice between different page granularity
      supported by the processors (for instance on ARM64: 4KB, 16KB, 64KB).
      This can't be enforced by the hypervisor and therefore it's possible to
      run guests using different page granularity.
      
      So we can't mandate the block backend to support indirect descriptor
      when the frontend is using 64KB page granularity and have to fix it
      properly in the frontend.
      
      The solution exposed below is based on modifying directly the frontend
      guest rather than asking the block framework to support smaller size
      (i.e < PAGE_SIZE). This is because the change is the block framework are
      not trivial as everything seems to relying on a struct *page (see [1]).
      Although, it may be possible that someone succeed to do it in the future
      and we would therefore be able to use it.
      
      Given that a block request may not fit in a single ring request, a
      second request is introduced for the data that cannot fit in the first
      one. This means that the second ring request should never be used on
      Linux if the page size is smaller than 44KB.
      
      To achieve the support of the extra ring request, the block queue size
      is divided by two. Therefore, the ring will always contain enough space
      to accommodate 2 ring requests. While this will reduce the overall
      performance, it will make the implementation more contained. The way
      forward to get better performance is to implement in the backend either
      indirect descriptor or multiple grants ring.
      
      Note that the parameters blk_queue_max_* helpers haven't been updated.
      The block code will set the mimimum size supported and we may be able
      to support directly any change in the block framework that lower down
      the minimal size of a request.
      
      [1] http://lists.xen.org/archives/html/xen-devel/2015-08/msg02200.htmlSigned-off-by: NJulien Grall <julien.grall@citrix.com>
      Acked-by: NRoger Pau Monné <roger.pau@citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      6cc56833
    • J
      xen-blkfront: Introduce blkif_ring_get_request · 2e073969
      Julien Grall 提交于
      The code to get a request is always the same. Therefore we can factorize
      it in a single function.
      Signed-off-by: NJulien Grall <julien.grall@citrix.com>
      Acked-by: NRoger Pau Monné <roger.pau@citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      2e073969
    • K
      xen/blocks: Return -EXX instead of -1 · bde21f73
      Konrad Rzeszutek Wilk 提交于
      Lets return sensible values instead of -1.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      bde21f73
    • P
      xen/blkfront: correct setting for xen_blkif_max_ring_order · 45fc8264
      Peng Fan 提交于
      According to this piece code:
      "
           pr_info("Invalid max_ring_order (%d), will use default max: %d.\n",
                    xen_blkif_max_ring_order, XENBUS_MAX_RING_GRANT_ORDER);
      "
      if xen_blkif_max_ring_order is bigger that XENBUS_MAX_RING_GRANT_ORDER,
      need to set xen_blkif_max_ring_order using XENBUS_MAX_RING_GRANT_ORDER,
      but not 0.
      Signed-off-by: NPeng Fan <van.freenix@gmail.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Cc: "Roger Pau Monné" <roger.pau@citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      45fc8264
    • B
      xen/blkfront: make persistent grants pool per-queue · 73716df7
      Bob Liu 提交于
      Make persistent grants per-queue/ring instead of per-device, so that we can
      drop the 'dev_lock' and get better scalability.
      
      Test was done based on null_blk driver:
      dom0: v4.2-rc8 16vcpus 10GB "modprobe null_blk"
      domu: v4.2-rc8 16vcpus 10GB
      
      [test]
      rw=read
      direct=1
      ioengine=libaio
      bs=4k
      time_based
      runtime=30
      filename=/dev/xvdb
      numjobs=16
      iodepth=64
      iodepth_batch=64
      iodepth_batch_complete=64
      group_reporting
      
      Queues:			  1 	   4 	  	  8 	 	 16
      Iops orig(k):		810 	1064 		780 		700
      Iops patched(k):	810     1230(~20%)	1024(~20%)	850(~20%)
      Signed-off-by: NBob Liu <bob.liu@oracle.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      73716df7
    • B
      xen/blkfront: Remove duplicate setting of ->xbdev. · 75f070b3
      Bob Liu 提交于
      We do the same exact operations a bit earlier in the
      function.
      Signed-off-by: NBob Liu <bob.liu@oracle.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      75f070b3