1. 30 1月, 2016 1 次提交
  2. 05 1月, 2016 19 次提交
    • K
      xen/blkfront: Fix crash if backend doesn't follow the right states. · c31ecf6c
      Konrad Rzeszutek Wilk 提交于
      We have split the setting up of all the resources in two steps:
      1) talk_to_blkback  - which figures out the num_ring_pages (from
         the default value of zero), sets up shadow and so
      2) blkfront_connect - does the real part of filling out the
         internal structures.
      
      The problem is if we bypass the 1) step and go straight to 2)
      and call blkfront_setup_indirect where we use the macro
      BLK_RING_SIZE - which returns an negative value (because
      sz is zero  - since num_ring_pages is zero - since it has never
      been set).
      
      We can fix this by making sure that we always have called
      talk_to_blkback before going to blkfront_connect.
      
      Or we could set in blkfront_probe info->nr_ring_pages = 1
      to have a default value. But that looks odd - as we haven't
      actually negotiated any ring size.
      
      This patch changes XenbusStateConnected state to detect if
      we haven't done the initial handshake - and if so continue
      on as if were in XenbusStateInitWait state.
      
      We also roll the error recovery (freeing the structure) into
      talk_to_blkback error path - which is safe since that function
      is only called from blkback_changed.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      c31ecf6c
    • B
      xen/blkback: Fix two memory leaks. · 93bb277f
      Bob Liu 提交于
      This patch fixs two memleaks:
        backtrace:
          [<ffffffff817ba5e8>] kmemleak_alloc+0x28/0x50
          [<ffffffff81205e3b>] kmem_cache_alloc+0xbb/0x1d0
          [<ffffffff81534028>] xen_blkbk_probe+0x58/0x230
          [<ffffffff8146adb6>] xenbus_dev_probe+0x76/0x130
          [<ffffffff81511716>] driver_probe_device+0x166/0x2c0
          [<ffffffff815119bc>] __device_attach_driver+0xac/0xb0
          [<ffffffff8150fa57>] bus_for_each_drv+0x67/0x90
          [<ffffffff81511ab7>] __device_attach+0xc7/0x120
          [<ffffffff81511b23>] device_initial_probe+0x13/0x20
          [<ffffffff8151059a>] bus_probe_device+0x9a/0xb0
          [<ffffffff8150f0a1>] device_add+0x3b1/0x5c0
          [<ffffffff8150f47e>] device_register+0x1e/0x30
          [<ffffffff8146a9e8>] xenbus_probe_node+0x158/0x170
          [<ffffffff8146abaf>] xenbus_dev_changed+0x1af/0x1c0
          [<ffffffff8146b1bb>] backend_changed+0x1b/0x20
          [<ffffffff81468ca6>] xenwatch_thread+0xb6/0x160
      unreferenced object 0xffff880007ba8ef8 (size 224):
      
        backtrace:
          [<ffffffff817ba5e8>] kmemleak_alloc+0x28/0x50
          [<ffffffff81205c73>] __kmalloc+0xd3/0x1e0
          [<ffffffff81534d87>] frontend_changed+0x2c7/0x580
          [<ffffffff8146af12>] xenbus_otherend_changed+0xa2/0xb0
          [<ffffffff8146b2c0>] frontend_changed+0x10/0x20
          [<ffffffff81468ca6>] xenwatch_thread+0xb6/0x160
          [<ffffffff810d3e97>] kthread+0xd7/0xf0
          [<ffffffff817c4a9f>] ret_from_fork+0x3f/0x70
          [<ffffffffffffffff>] 0xffffffffffffffff
      unreferenced object 0xffff8800048dcd38 (size 224):
      
      The first leak is caused by not put() the be->blkif reference
      which we had gotten in xen_blkif_alloc(), while the second is
      us not freeing blkif->rings in the right place.
      Signed-off-by: NBob Liu <bob.liu@oracle.com>
      Reported-and-Tested-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      93bb277f
    • B
      xen/blkback: make st_ statistics per ring · db6fbc10
      Bob Liu 提交于
      Make st_* statistics per ring and the VBD sysfs would iterate over all the
      rings.
      
      Note: xenvbd_sysfs_delif() is called in xen_blkbk_remove() before all rings
      are torn down, so it's safe.
      Signed-off-by: NBob Liu <bob.liu@oracle.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      ---
      v2: Aligned the variables on the same column.
      db6fbc10
    • J
      xen/blkfront: Handle non-indirect grant with 64KB pages · 6cc56833
      Julien Grall 提交于
      The minimal size of request in the block framework is always PAGE_SIZE.
      It means that when 64KB guest is support, the request will at least be
      64KB.
      
      Although, if the backend doesn't support indirect descriptor (such as QDISK
      in QEMU), a ring request is only able to accommodate 11 segments of 4KB
      (i.e 44KB).
      
      The current frontend is assuming that an I/O request will always fit in
      a ring request. This is not true any more when using 64KB page
      granularity and will therefore crash during boot.
      
      On ARM64, the ABI is completely neutral to the page granularity used by
      the domU. The guest has the choice between different page granularity
      supported by the processors (for instance on ARM64: 4KB, 16KB, 64KB).
      This can't be enforced by the hypervisor and therefore it's possible to
      run guests using different page granularity.
      
      So we can't mandate the block backend to support indirect descriptor
      when the frontend is using 64KB page granularity and have to fix it
      properly in the frontend.
      
      The solution exposed below is based on modifying directly the frontend
      guest rather than asking the block framework to support smaller size
      (i.e < PAGE_SIZE). This is because the change is the block framework are
      not trivial as everything seems to relying on a struct *page (see [1]).
      Although, it may be possible that someone succeed to do it in the future
      and we would therefore be able to use it.
      
      Given that a block request may not fit in a single ring request, a
      second request is introduced for the data that cannot fit in the first
      one. This means that the second ring request should never be used on
      Linux if the page size is smaller than 44KB.
      
      To achieve the support of the extra ring request, the block queue size
      is divided by two. Therefore, the ring will always contain enough space
      to accommodate 2 ring requests. While this will reduce the overall
      performance, it will make the implementation more contained. The way
      forward to get better performance is to implement in the backend either
      indirect descriptor or multiple grants ring.
      
      Note that the parameters blk_queue_max_* helpers haven't been updated.
      The block code will set the mimimum size supported and we may be able
      to support directly any change in the block framework that lower down
      the minimal size of a request.
      
      [1] http://lists.xen.org/archives/html/xen-devel/2015-08/msg02200.htmlSigned-off-by: NJulien Grall <julien.grall@citrix.com>
      Acked-by: NRoger Pau Monné <roger.pau@citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      6cc56833
    • J
      xen-blkfront: Introduce blkif_ring_get_request · 2e073969
      Julien Grall 提交于
      The code to get a request is always the same. Therefore we can factorize
      it in a single function.
      Signed-off-by: NJulien Grall <julien.grall@citrix.com>
      Acked-by: NRoger Pau Monné <roger.pau@citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      2e073969
    • J
      xen-blkback: clear PF_NOFREEZE for xen_blkif_schedule() · a6e7af12
      Jiri Kosina 提交于
      xen_blkif_schedule() kthread calls try_to_freeze() at the beginning of
      every attempt to purge the LRU. This operation can't ever succeed though,
      as the kthread hasn't marked itself as freezable.
      
      Before (hopefully eventually) kthread freezing gets converted to fileystem
      freezing, we'd rather mark xen_blkif_schedule() freezable (as it can
      generate I/O during suspend).
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      a6e7af12
    • K
      xen/blkback: Free resources if connect_ring failed. · 2d0382fa
      Konrad Rzeszutek Wilk 提交于
      With the multi-queue support we could fail at setting up
      some of the rings and fail the connection. That meant that
      all resources tied to rings[0..n-1] (where n is the ring
      that failed to be setup). Eventually the frontend will switch
      to the states and we will call xen_blkif_disconnect.
      
      However we do not want to be at the mercy of the frontend
      deciding when to change states. This allows us to do the
      cleanup right away and freeing resources.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      2d0382fa
    • K
      xen/blocks: Return -EXX instead of -1 · bde21f73
      Konrad Rzeszutek Wilk 提交于
      Lets return sensible values instead of -1.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      bde21f73
    • B
      xen/blkback: make pool of persistent grants and free pages per-queue · d4bf0065
      Bob Liu 提交于
      Make pool of persistent grants and free pages per-queue/ring instead of
      per-device to get better scalability.
      
      Test was done based on null_blk driver:
      dom0: v4.2-rc8 16vcpus 10GB "modprobe null_blk"
      domu: v4.2-rc8 16vcpus 10GB
      
      [test]
      rw=read
      direct=1
      ioengine=libaio
      bs=4k
      time_based
      runtime=30
      filename=/dev/xvdb
      numjobs=16
      iodepth=64
      iodepth_batch=64
      iodepth_batch_complete=64
      group_reporting
      
      Results:
      iops1: After patch "xen/blkfront: make persistent grants per-queue".
      iops2: After this patch.
      
      Queues:			  1 	   4 	  	  8 	 	 16
      Iops orig(k):		810 	1064 		780 		700
      Iops1(k):		810     1230(~20%)	1024(~20%)	850(~20%)
      Iops2(k):		810     1410(~35%)	1354(~75%)      1440(~100%)
      
      With 4 queues after this commit we can get ~75% increase in IOPS, and
      performance won't drop if increasing queue numbers.
      
      Please find the respective chart in this link:
      https://www.dropbox.com/s/agrcy2pbzbsvmwv/iops.png?dl=0Signed-off-by: NBob Liu <bob.liu@oracle.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      d4bf0065
    • B
      xen/blkback: get the number of hardware queues/rings from blkfront · d62d8600
      Bob Liu 提交于
      Backend advertises "multi-queue-max-queues" to front, also get the negotiated
      number from "multi-queue-num-queues" written by blkfront.
      Signed-off-by: NBob Liu <bob.liu@oracle.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      d62d8600
    • K
      xen/blkback: pseudo support for multi hardware queues/rings · 2fb1ef4f
      Konrad Rzeszutek Wilk 提交于
      Preparatory patch for multiple hardware queues (rings). The number of
      rings is unconditionally set to 1, larger number will be enabled in
      "xen/blkback: get the number of hardware queues/rings from blkfront".
      Signed-off-by: NArianna Avanzini <avanzini.arianna@gmail.com>
      Signed-off-by: NBob Liu <bob.liu@oracle.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      ---
      v2: Align variables in the structures.
      2fb1ef4f
    • B
      xen/blkback: separate ring information out of struct xen_blkif · 59795700
      Bob Liu 提交于
      Split per ring information to an new structure "xen_blkif_ring", so that one vbd
      device can be associated with one or more rings/hardware queues.
      
      Introduce 'pers_gnts_lock' to protect the pool of persistent grants since we
      may have multi backend threads.
      
      This patch is a preparation for supporting multi hardware queues/rings.
      Signed-off-by: NArianna Avanzini <avanzini.arianna@gmail.com>
      Signed-off-by: NBob Liu <bob.liu@oracle.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      ---
      v2: Align the variables in the structure.
      59795700
    • P
      xen/blkfront: correct setting for xen_blkif_max_ring_order · 45fc8264
      Peng Fan 提交于
      According to this piece code:
      "
           pr_info("Invalid max_ring_order (%d), will use default max: %d.\n",
                    xen_blkif_max_ring_order, XENBUS_MAX_RING_GRANT_ORDER);
      "
      if xen_blkif_max_ring_order is bigger that XENBUS_MAX_RING_GRANT_ORDER,
      need to set xen_blkif_max_ring_order using XENBUS_MAX_RING_GRANT_ORDER,
      but not 0.
      Signed-off-by: NPeng Fan <van.freenix@gmail.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Cc: "Roger Pau Monné" <roger.pau@citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      45fc8264
    • B
      xen/blkfront: make persistent grants pool per-queue · 73716df7
      Bob Liu 提交于
      Make persistent grants per-queue/ring instead of per-device, so that we can
      drop the 'dev_lock' and get better scalability.
      
      Test was done based on null_blk driver:
      dom0: v4.2-rc8 16vcpus 10GB "modprobe null_blk"
      domu: v4.2-rc8 16vcpus 10GB
      
      [test]
      rw=read
      direct=1
      ioengine=libaio
      bs=4k
      time_based
      runtime=30
      filename=/dev/xvdb
      numjobs=16
      iodepth=64
      iodepth_batch=64
      iodepth_batch_complete=64
      group_reporting
      
      Queues:			  1 	   4 	  	  8 	 	 16
      Iops orig(k):		810 	1064 		780 		700
      Iops patched(k):	810     1230(~20%)	1024(~20%)	850(~20%)
      Signed-off-by: NBob Liu <bob.liu@oracle.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      73716df7
    • B
      xen/blkfront: Remove duplicate setting of ->xbdev. · 75f070b3
      Bob Liu 提交于
      We do the same exact operations a bit earlier in the
      function.
      Signed-off-by: NBob Liu <bob.liu@oracle.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      75f070b3
    • K
    • B
      xen/blkfront: negotiate number of queues/rings to be used with backend · 28d949bc
      Bob Liu 提交于
      The max number of hardware queues for xen/blkfront is set by parameter
      'max_queues'(default 4), while it is also capped by the max value that the
      xen/blkback exposes through XenStore key 'multi-queue-max-queues'.
      
      The negotiated number is the smaller one and would be written back to xenstore
      as "multi-queue-num-queues", blkback needs to read this negotiated number.
      Signed-off-by: NBob Liu <bob.liu@oracle.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      28d949bc
    • B
      xen/blkfront: split per device io_lock · 11659569
      Bob Liu 提交于
      After patch "xen/blkfront: separate per ring information out of device
      info", per-ring data is protected by a per-device lock ('io_lock').
      
      This is not a good way and will effect the scalability, so introduce a
      per-ring lock ('ring_lock').
      
      The old 'io_lock' is renamed to 'dev_lock' which protects the ->grants list and
      ->persistent_gnts_c which are shared by all rings.
      
      Note that in 'blkfront_probe' the 'blkfront_info' is setup via kzalloc
      so setting ->persistent_gnts_c to zero is not needed.
      Signed-off-by: NBob Liu <bob.liu@oracle.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      11659569
    • B
      xen/blkfront: pseudo support for multi hardware queues/rings · 3df0e505
      Bob Liu 提交于
      Preparatory patch for multiple hardware queues (rings). The number of
      rings is unconditionally set to 1, larger number will be enabled in
      patch "xen/blkfront: negotiate number of queues/rings to be used with backend"
      so as to make review easier.
      
      Note that blkfront_gather_backend_features does not call
      blkfront_setup_indirect anymore (as that needs to be done per ring).
      That means that in blkif_recover/blkif_connect we have to do it in a loop
      (bounded by nr_rings).
      Signed-off-by: NBob Liu <bob.liu@oracle.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      3df0e505
  3. 04 1月, 2016 1 次提交
  4. 23 12月, 2015 2 次提交
  5. 26 11月, 2015 17 次提交