1. 21 2月, 2017 7 次提交
  2. 28 1月, 2017 1 次提交
  3. 04 1月, 2017 1 次提交
  4. 22 11月, 2016 1 次提交
    • E
      block: Return -ENOTSUP rather than assert on unaligned discards · 49228d1e
      Eric Blake 提交于
      Right now, the block layer rounds discard requests, so that
      individual drivers are able to assert that discard requests
      will never be unaligned.  But there are some ISCSI devices
      that track and coalesce multiple unaligned requests, turning it
      into an actual discard if the requests eventually cover an
      entire page, which implies that it is better to always pass
      discard requests as low down the stack as possible.
      
      In isolation, this patch has no semantic effect, since the
      block layer currently never passes an unaligned request through.
      But the block layer already has code that silently ignores
      drivers that return -ENOTSUP for a discard request that cannot
      be honored (as well as drivers that return 0 even when nothing
      was done).  But the next patch will update the block layer to
      fragment discard requests, so that clients are guaranteed that
      they are either dealing with an unaligned head or tail, or an
      aligned core, making it similar to the block layer semantics of
      write zero fragmentation.
      
      CC: qemu-stable@nongnu.org
      Signed-off-by: NEric Blake <eblake@redhat.com>
      Reviewed-by: NMax Reitz <mreitz@redhat.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      49228d1e
  5. 24 10月, 2016 2 次提交
    • R
      block/iscsi: Adding new iSER transport layer option · e0ae4987
      Roy Shterman 提交于
      iSER is a new transport layer supported in Libiscsi,
      iSER provides a zero-copy RDMA capable interface that can
      improve performance.
      
      In order to use the new iSER transport one need to have RDMA supported HW
      and to choose iser as the protocol name in Libiscsi URI.
      
      For now iSER memory buffers are pre-allocated and pre-registered,
      hence in order to work with iSER from QEMU, one need to enable
      MEMLOCK attribute in the VM to be large enough for all iSER buffers and RDMA
      resources.
      Signed-off-by: NRoy Shterman <roysh@mellanox.com>
      Message-Id: <1476000896-18632-3-git-send-email-roysh@mellanox.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      e0ae4987
    • R
      block/iscsi: Introducing new zero-copy API · 583ec22e
      Roy Shterman 提交于
      A new API to deploy zero-copy command submission. The new API takes I/O
      vectors list and number of I/O vectors to submit as input parameters
      when initiating the command. New API must be used if working with
      iSER transport option.
      Signed-off-by: NRoy Shterman <roysh@mellanox.com>
      Message-Id: <1476000896-18632-2-git-send-email-roysh@mellanox.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      583ec22e
  6. 07 10月, 2016 1 次提交
  7. 23 9月, 2016 2 次提交
    • F
      util: Add UUID API · cea25275
      Fam Zheng 提交于
      A number of different places across the code base use CONFIG_UUID. Some
      of them are soft dependency, some are not built if libuuid is not
      available, some come with dummy fallback, some throws runtime error.
      
      It is hard to maintain, and hard to reason for users.
      
      Since UUID is a simple standard with only a small number of operations,
      it is cleaner to have a central support in libqemuutil. This patch adds
      qemu_uuid_* functions that all uuid users in the code base can
      rely on. Except for qemu_uuid_generate which is new code, all other
      functions are just copy from existing fallbacks from other files.
      
      Note that qemu_uuid_parse is moved without updating the function
      signature to use QemuUUID, to keep this patch simple.
      Signed-off-by: NFam Zheng <famz@redhat.com>
      Reviewed-by: NEric Blake <eblake@redhat.com>
      Reviewed-by: NJeff Cody <jcody@redhat.com>
      Message-Id: <1474432046-325-2-git-send-email-famz@redhat.com>
      cea25275
    • E
      iscsi: Fix divide-by-zero regression on raw SG devices · 95eaa785
      Eric Blake 提交于
      When qemu uses iscsi devices in sg mode, iscsilun->block_size
      is left at 0.  Prior to commits cf081fca and similar, when
      block limits were tracked in sectors, this did not matter:
      various block limits were just left at 0.  But when we started
      scaling by block size, this caused SIGFPE.
      
      Then, in a later patch, commit a5b8dd2c added an assertion to
      bdrv_open_common() that request_alignment is always non-zero;
      which was not true for SG mode.  Rather than relax that assertion,
      we can just provide a sane value (we don't know of any SG device
      with a block size smaller than qemu's default sizing of 512 bytes).
      
      One possible solution for SG mode is to just blindly skip ALL
      of iscsi_refresh_limits(), since we already short circuit so
      many other things in sg mode.  But this patch takes a slightly
      more conservative approach, and merely guarantees that scaling
      will succeed, while still using multiples of the original size
      where possible.  Resulting limits may still be zero in SG mode
      (that is, we mostly only fix block_size used as a denominator
      or which affect assertions, not all uses).
      Reported-by: NHolger Schranz <holger@fam-schranz.de>
      Signed-off-by: NEric Blake <eblake@redhat.com>
      CC: qemu-stable@nongnu.org
      
      Message-Id: <1473283640-15756-1-git-send-email-eblake@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      95eaa785
  8. 21 9月, 2016 1 次提交
  9. 20 7月, 2016 2 次提交
  10. 19 7月, 2016 2 次提交
    • P
      block/iscsi: allow caching of the allocation map · e1123a3b
      Peter Lieven 提交于
      until now the allocation map was used only as a hint if a cluster
      is allocated or not. If a block was not allocated (or Qemu had
      no info about the allocation status) a get_block_status call was
      issued to check the allocation status and possibly avoid
      a subsequent read of unallocated sectors. If a block known to be
      allocated the get_block_status call was omitted. In the other case
      a get_block_status call was issued before every read to avoid
      the necessity for a consistent allocation map. To avoid the
      potential overhead of calling get_block_status for each and
      every read request this took only place for the bigger requests.
      
      This patch enhances this mechanism to cache the allocation
      status and avoid calling get_block_status for blocks where
      the allocation status has been queried before. This allows
      for bypassing the read request even for smaller requests and
      additionally omits calling get_block_status for known to be
      unallocated blocks.
      Signed-off-by: NPeter Lieven <pl@kamp.de>
      Message-Id: <1468831940-15556-3-git-send-email-pl@kamp.de>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      e1123a3b
    • P
      block/iscsi: fix rounding in iscsi_allocationmap_set · eb36b953
      Peter Lieven 提交于
      when setting clusters as alloacted the boundaries have
      to be expanded. As Paolo pointed out the calculation of
      the number of clusters is wrong:
      
      Suppose cluster_sectors is 2, sector_num = 1, nb_sectors = 6:
      
      In the "mark allocated" case, you want to set 0..8, i.e.
      cluster_num=0, nb_clusters=4.
      
         0--.--2--.--4--.--6--.--8
         <--|_________________|-->  (<--> = expanded)
      
      Instead you are setting nb_clusters=3, so that 6..8 is not marked.
      
         0--.--2--.--4--.--6--.--8
         <--|______________|!!!     (! = wrong)
      
      Cc: qemu-stable@nongnu.org
      Reported-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NPeter Lieven <pl@kamp.de>
      Message-Id: <1468831940-15556-2-git-send-email-pl@kamp.de>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      eb36b953
  11. 13 7月, 2016 1 次提交
    • P
      coroutine: move entry argument to qemu_coroutine_create · 0b8b8753
      Paolo Bonzini 提交于
      In practice the entry argument is always known at creation time, and
      it is confusing that sometimes qemu_coroutine_enter is used with a
      non-NULL argument to re-enter a coroutine (this happens in
      block/sheepdog.c and tests/test-coroutine.c).  So pass the opaque value
      at creation time, for consistency with e.g. aio_bh_new.
      
      Mostly done with the following semantic patch:
      
      @ entry1 @
      expression entry, arg, co;
      @@
      - co = qemu_coroutine_create(entry);
      + co = qemu_coroutine_create(entry, arg);
        ...
      - qemu_coroutine_enter(co, arg);
      + qemu_coroutine_enter(co);
      
      @ entry2 @
      expression entry, arg;
      identifier co;
      @@
      - Coroutine *co = qemu_coroutine_create(entry);
      + Coroutine *co = qemu_coroutine_create(entry, arg);
        ...
      - qemu_coroutine_enter(co, arg);
      + qemu_coroutine_enter(co);
      
      @ entry3 @
      expression entry, arg;
      @@
      - qemu_coroutine_enter(qemu_coroutine_create(entry), arg);
      + qemu_coroutine_enter(qemu_coroutine_create(entry, arg));
      
      @ reentry @
      expression co;
      @@
      - qemu_coroutine_enter(co, NULL);
      + qemu_coroutine_enter(co);
      
      except for the aforementioned few places where the semantic patch
      stumbled (as expected) and for test_co_queue, which would otherwise
      produce an uninitialized variable warning.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Reviewed-by: NFam Zheng <famz@redhat.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      0b8b8753
  12. 12 7月, 2016 1 次提交
  13. 05 7月, 2016 6 次提交
  14. 29 6月, 2016 1 次提交
  15. 08 6月, 2016 3 次提交
    • E
      iscsi: Convert to bdrv_co_pwrite_zeroes() · 94d047a3
      Eric Blake 提交于
      Another step on our continuing quest to switch to byte-based
      interfaces.
      
      As this is the first byte-based iscsi interface, convert
      is_request_lun_aligned() into two versions, one for sectors
      and one for bytes.  Also, change from outright -EINVAL failure
      on an unaligned request, to instead failing with -ENOTSUP to
      trigger a read-modify-write fallback, particularly since the
      block layer should be honoring bs->request_alignment to avoid
      -EINVAL on read/write requests.
      Signed-off-by: NEric Blake <eblake@redhat.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      94d047a3
    • E
      block: Track write zero limits in bytes · cf081fca
      Eric Blake 提交于
      Another step towards removing sector-based interfaces: convert
      the maximum write and minimum alignment values from sectors to
      bytes.  Rename the variables to let the compiler check that all
      users are converted to the new semantics.
      
      The maximum remains an int as long as BDRV_REQUEST_MAX_SECTORS
      is constrained by INT_MAX (this means that we can't even
      support a 2G write_zeroes, but just under it) - changing
      operation lengths to unsigned or to 64-bits is a much bigger
      audit, and debatable if we even want to do it (since at the
      core, a 32-bit platform will still have ssize_t as its
      underlying limit on write()).
      
      Meanwhile, alignment is changed to 'uint32_t', since it makes no
      sense to have an alignment larger than the maximum write, and
      less painful to use an unsigned type with well-defined behavior
      in bit operations than to have to worry about what happens if
      a driver mistakenly supplies a negative alignment.
      
      Add an assert that no one was trying to use sectors to get a
      write zeroes larger than 2G, and therefore that a later conversion
      to bytes won't be impacted by keeping the limit at 32 bits.
      Signed-off-by: NEric Blake <eblake@redhat.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      cf081fca
    • E
      iscsi: Use block size as minimum zero/discard alignment · 8b184744
      Eric Blake 提交于
      If hardware does not advertise a minimum zero/discard
      alignment, we still want to guarantee that the block layer
      will align requests to our blocks, rather than the arbitrary
      512-byte BDRV sector size.
      Signed-off-by: NEric Blake <eblake@redhat.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      8b184744
  16. 29 5月, 2016 1 次提交
  17. 23 5月, 2016 1 次提交
  18. 12 5月, 2016 3 次提交
    • E
      block: Honor BDRV_REQ_FUA during write_zeroes · 465fe887
      Eric Blake 提交于
      The block layer has a couple of cases where it can lose
      Force Unit Access semantics when writing a large block of
      zeroes, such that the request returns before the zeroes
      have been guaranteed to land on underlying media.
      
      SCSI does not support FUA during WRITESAME(10/16); FUA is only
      supported if it falls back to WRITE(10/16).  But where the
      underlying device is new enough to not need a fallback, it
      means that any upper layer request with FUA semantics was
      silently ignoring BDRV_REQ_FUA.
      
      Conversely, NBD has situations where it can support FUA but not
      ZERO_WRITE; when that happens, the generic block layer fallback
      to bdrv_driver_pwritev() (or the older bdrv_co_writev() in qemu
      2.6) was losing the FUA flag.
      
      The problem of losing flags unrelated to ZERO_WRITE has been
      latent in bdrv_co_do_write_zeroes() since commit aa7bfbff, but
      back then, it did not matter because there was no FUA flag.  It
      became observable when commit 93f5e6d8 paved the way for flags
      that can impact correctness, when we should have been using
      bdrv_co_writev_flags() with modified flags.  Compare to commit
      9eeb6dd1, which got flag manipulation right in
      bdrv_co_do_zero_pwritev().
      
      Symptoms: I tested with qemu-io with default writethrough cache
      (which is supposed to use FUA semantics on every write), and
      targetted an NBD client connected to a server that intentionally
      did not advertise NBD_FLAG_SEND_FUA.  When doing 'write 0 512',
      the NBD client sent two operations (NBD_CMD_WRITE then
      NBD_CMD_FLUSH) to get the fallback FUA semantics; but when doing
      'write -z 0 512', the NBD client sent only NBD_CMD_WRITE.
      
      The fix is do to a cleanup bdrv_co_flush() at the end of the
      operation if any step in the middle relied on a BDS that does
      not natively support FUA for that step (note that we don't
      need to flush after every operation, if the operation is broken
      into chunks based on bounce-buffer sizing).  Each BDS gains a
      new flag .supported_zero_flags, which parallels the use of
      .supported_write_flags but only when accessing a zero write
      operation (the flags MUST be different, because of SCSI having
      different semantics based on WRITE vs. WRITESAME; and also
      because BDRV_REQ_MAY_UNMAP only makes sense on zero writes).
      
      Also fix some documentation to describe -ENOTSUP semantics,
      particularly since iscsi depends on those semantics.
      
      Down the road, we may want to add a driver where its
      .bdrv_co_pwritev() honors all three of BDRV_REQ_FUA,
      BDRV_REQ_ZERO_WRITE, and BDRV_REQ_MAY_UNMAP, and advertise
      this via bs->supported_write_flags for blocks opened by that
      driver; such a driver should NOT supply .bdrv_co_write_zeroes
      nor .supported_zero_flags.  But none of the drivers touched
      in this patch want to do that (the act of writing zeroes is
      different enough from normal writes to deserve a second
      callback).
      Signed-off-by: NEric Blake <eblake@redhat.com>
      Reviewed-by: NFam Zheng <famz@redhat.com>
      Acked-by: NStefan Hajnoczi <stefanha@redhat.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      465fe887
    • E
      block: Make supported_write_flags a per-bds property · 4df863f3
      Eric Blake 提交于
      Pre-patch, .supported_write_flags lives at the driver level, which
      means we are blindly declaring that all block devices using a
      given driver will either equally support FUA, or that we need a
      fallback at the block layer.  But there are drivers where FUA
      support is a per-block decision: the NBD block driver is dependent
      on the remote server advertising NBD_FLAG_SEND_FUA (and has
      fallback code to duplicate the flush that the block layer would do
      if NBD had not set .supported_write_flags); and the iscsi block
      driver is dependent on the mode sense bits advertised by the
      underlying device (and is currently silently ignoring FUA requests
      if the underlying device does not support FUA).
      
      The fix is to make supported flags as a per-BDS option, set during
      .bdrv_open().  This patch moves the variable and fixes NBD and iscsi
      to set it only conditionally; later patches will then further
      simplify the NBD driver to quit duplicating work done at the block
      layer, as well as tackle the fact that SCSI does not support FUA
      semantics on WRITESAME(10/16) but only on WRITE(10/16).
      Signed-off-by: NEric Blake <eblake@redhat.com>
      Reviewed-by: NFam Zheng <famz@redhat.com>
      Acked-by: NStefan Hajnoczi <stefanha@redhat.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      4df863f3
    • K
      block: Introduce bdrv_driver_pwritev() · 78a07294
      Kevin Wolf 提交于
      This is a function that simply calls into the block driver for doing a
      write, providing the byte granularity interface we want to eventually
      have everywhere, and using whatever interface that driver supports.
      
      This one is a bit more interesting than the version for reads: It adds
      support for .bdrv_co_writev_flags() everywhere, so that drivers
      implementing this function can drop .bdrv_co_writev() now.
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      Reviewed-by: NEric Blake <eblake@redhat.com>
      Reviewed-by: NFam Zheng <famz@redhat.com>
      78a07294
  19. 30 3月, 2016 2 次提交
  20. 01 3月, 2016 1 次提交