1. 11 7月, 2017 2 次提交
  2. 26 6月, 2017 3 次提交
  3. 29 5月, 2017 1 次提交
  4. 11 5月, 2017 4 次提交
    • E
      qcow2: Discard/zero clusters by byte count · d2cb36af
      Eric Blake 提交于
      Passing a byte offset, but sector count, when we ultimately
      want to operate on cluster granularity, is madness.  Clean up
      the external interfaces to take both offset and count as bytes,
      while still keeping the assertion added previously that the
      caller must align the values to a cluster.  Then rename things
      to make sure backports don't get confused by changed units:
      instead of qcow2_discard_clusters() and qcow2_zero_clusters(),
      we now have qcow2_cluster_discard() and qcow2_cluster_zeroize().
      
      The internal functions still operate on clusters at a time, and
      return an int for number of cleared clusters; but on an image
      with 2M clusters, a single L2 table holds 256k entries that each
      represent a 2M cluster, totalling well over INT_MAX bytes if we
      ever had a request for that many bytes at once.  All our callers
      currently limit themselves to 32-bit bytes (and therefore fewer
      clusters), but by making this function 64-bit clean, we have one
      less place to clean up if we later improve the block layer to
      support 64-bit bytes through all operations (with the block layer
      auto-fragmenting on behalf of more-limited drivers), rather than
      the current state where some interfaces are artificially limited
      to INT_MAX at a time.
      Signed-off-by: NEric Blake <eblake@redhat.com>
      Reviewed-by: NMax Reitz <mreitz@redhat.com>
      Message-id: 20170507000552.20847-13-eblake@redhat.com
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      d2cb36af
    • E
      qcow2: Optimize write zero of unaligned tail cluster · fbaa6bb3
      Eric Blake 提交于
      We've already improved discards to operate efficiently on the tail
      of an unaligned qcow2 image; it's time to make a similar improvement
      to write zeroes.  The special case is only valid at the tail
      cluster of a file, where we must recognize that any sectors beyond
      the image end would implicitly read as zero, and therefore should
      not penalize our logic for widening a partial cluster into writing
      the whole cluster as zero.
      
      However, note that for now, the special case of end-of-file is only
      recognized if there is no backing file, or if the backing file has
      the same length; that's because when the backing file is shorter
      than the active layer, we don't have code in place to recognize
      that reads of a sector unallocated at the top and beyond the backing
      end-of-file are implicitly zero.  It's not much of a real loss,
      because most people don't use images that aren't cluster-aligned,
      or where the active layer is a different size than the backing
      layer (especially where the difference falls within a single cluster).
      
      Update test 154 to cover the new scenarios, using two images of
      intentionally differing length.
      
      While at it, fix the test to gracefully skip when run as
      ./check -qcow2 -o compat=0.10 154
      since the older format lacks zero clusters already required earlier
      in the test.
      Signed-off-by: NEric Blake <eblake@redhat.com>
      Reviewed-by: NMax Reitz <mreitz@redhat.com>
      Message-id: 20170507000552.20847-11-eblake@redhat.com
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      fbaa6bb3
    • E
      qcow2: Make distinction between zero cluster types obvious · fdfab37d
      Eric Blake 提交于
      Treat plain zero clusters differently from allocated ones, so that
      we can simplify the logic of checking whether an offset is present.
      Do this by splitting QCOW2_CLUSTER_ZERO into two new enums,
      QCOW2_CLUSTER_ZERO_PLAIN and QCOW2_CLUSTER_ZERO_ALLOC.
      
      I tried to arrange the enum so that we could use
      'ret <= QCOW2_CLUSTER_ZERO_PLAIN' for all unallocated types, and
      'ret >= QCOW2_CLUSTER_ZERO_ALLOC' for allocated types, although
      I didn't actually end up taking advantage of the layout.
      
      In many cases, this leads to simpler code, by properly combining
      cases (sometimes, both zero types pair together, other times,
      plain zero is more like unallocated while allocated zero is more
      like normal).
      Signed-off-by: NEric Blake <eblake@redhat.com>
      Message-id: 20170507000552.20847-7-eblake@redhat.com
      Reviewed-by: NMax Reitz <mreitz@redhat.com>
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      fdfab37d
    • M
      qcow2: Fix preallocation size formula · 92413c16
      Max Reitz 提交于
      When calculating the number of reftable entries, we should actually use
      the number of refblocks and not (wrongly[1]) re-calculate it.
      
      [1] "Wrongly" means: Dividing the number of clusters by the number of
          entries per refblock and rounding down instead of up.
      Reported-by: NEric Blake <eblake@redhat.com>
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      Reviewed-by: NEric Blake <eblake@redhat.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      92413c16
  5. 09 5月, 2017 1 次提交
  6. 28 4月, 2017 4 次提交
  7. 01 3月, 2017 4 次提交
    • K
      block: Add BDRV_O_RESIZE for blk_new_open() · 55880601
      Kevin Wolf 提交于
      blk_new_open() is a convenience function that processes flags rather
      than QDict options as a simple way to just open an image file.
      
      In order to keep it convenient in the future, it must automatically
      request the necessary permissions. This can easily be inferred from the
      flags for read and write, but we need another flag that tells us whether
      to get the resize permission.
      
      We can't just always request it because that means that no block jobs
      can run on the resulting BlockBackend (which is something that e.g.
      qemu-img commit wants to do), but we also can't request it never because
      most of the .bdrv_create() implementations call blk_truncate().
      
      The solution is to introduce another flag that is passed by all users
      that want to resize the image.
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      Acked-by: NFam Zheng <famz@redhat.com>
      Reviewed-by: NMax Reitz <mreitz@redhat.com>
      55880601
    • K
      block: Add error parameter to blk_insert_bs() · d7086422
      Kevin Wolf 提交于
      Now that blk_insert_bs() requests the BlockBackend permissions for the
      node it attaches to, it can fail. Instead of aborting, pass the errors
      to the callers.
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      Reviewed-by: NMax Reitz <mreitz@redhat.com>
      Acked-by: NFam Zheng <famz@redhat.com>
      d7086422
    • K
      block: Add permissions to blk_new() · 6d0eb64d
      Kevin Wolf 提交于
      We want every user to be specific about the permissions it needs, so
      we'll pass the initial permissions as parameters to blk_new(). A user
      only needs to call blk_set_perm() if it wants to change the permissions
      after the fact.
      
      The permissions are stored in the BlockBackend and applied whenever a
      BlockDriverState should be attached in blk_insert_bs().
      
      This does not include actually choosing the right set of permissions
      everywhere yet. Instead, the usual FIXME comment is added to each place
      and will be addressed in individual patches.
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      Reviewed-by: NMax Reitz <mreitz@redhat.com>
      Acked-by: NFam Zheng <famz@redhat.com>
      6d0eb64d
    • K
      block: Request child permissions in format drivers · 862f215f
      Kevin Wolf 提交于
      This makes use of the .bdrv_child_perm() implementation for formats that
      we just added. All format drivers expose the permissions they actually
      need nows, so that they can be set accordingly and updated when parents
      are attached or detached.
      
      The only format not included here is raw, which was already converted
      with the other filter drivers.
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      Reviewed-by: NMax Reitz <mreitz@redhat.com>
      Acked-by: NFam Zheng <famz@redhat.com>
      862f215f
  8. 24 2月, 2017 3 次提交
    • K
      block: Attach bs->file only during .bdrv_open() · 4e4bf5c4
      Kevin Wolf 提交于
      The way that attaching bs->file worked was a bit unusual in that it was
      the only child that would be attached to a node which is not opened yet.
      Because of this, the block layer couldn't know yet which permissions the
      driver would eventually need.
      
      This patch moves the point where bs->file is attached to the beginning
      of the individual .bdrv_open() implementations, so drivers already know
      what they are going to do with the child. This is also more consistent
      with how driver-specific children work.
      
      For a moment, bdrv_open() gets its own BdrvChild to perform image
      probing, but instead of directly assigning this BdrvChild to the BDS, it
      becomes a temporary one and the node name is passed as an option to the
      drivers, so that they can simply use bdrv_open_child() to create another
      reference for their own use.
      
      This duplicated child for (the not opened yet) bs is not the final
      state, a follow-up patch will change the image probing code to use a
      BlockBackend, which is completely independent of bs.
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      Reviewed-by: NMax Reitz <mreitz@redhat.com>
      4e4bf5c4
    • K
      block: Pass BdrvChild to bdrv_truncate() · 52cdbc58
      Kevin Wolf 提交于
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      Reviewed-by: NMax Reitz <mreitz@redhat.com>
      52cdbc58
    • K
      qcow2: Use BB for resizing in qcow2_amend_options() · 70b27f36
      Kevin Wolf 提交于
      In order to able to convert bdrv_truncate() to take a BdrvChild and
      later to correctly check the resize permission here, we need to use a
      BlockBackend for resizing the image.
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      Reviewed-by: NMax Reitz <mreitz@redhat.com>
      70b27f36
  9. 12 2月, 2017 1 次提交
    • A
      qcow2: Optimize the refcount-block overlap check · 7061a078
      Alberto Garcia 提交于
      The metadata overlap checks introduced in a40f1c2a help detect
      corruption in the qcow2 image by verifying that data writes don't
      overlap with existing metadata sections.
      
      The 'refcount-block' check in particular iterates over the refcount
      table in order to get the addresses of all refcount blocks and check
      that none of them overlap with the region where we want to write.
      
      The problem with the refcount table is that since it always occupies
      complete clusters its size is usually very big. With the default
      values of cluster_size=64KB and refcount_bits=16 this table holds 8192
      entries, each one of them enough to map 2GB worth of host clusters.
      
      So unless we're using images with several TB of allocated data this
      table is going to be mostly empty, and iterating over it is a waste of
      CPU. If the storage backend is fast enough this can have an effect on
      I/O performance.
      
      This patch keeps the index of the last used (i.e. non-zero) entry in
      the refcount table and updates it every time the table changes. The
      refcount-block overlap check then uses that index instead of reading
      the whole table.
      
      In my tests with a 4GB qcow2 file stored in RAM this doubles the
      amount of write IOPS.
      Signed-off-by: NAlberto Garcia <berto@igalia.com>
      Message-id: 20170201123828.4815-1-berto@igalia.com
      Reviewed-by: NMax Reitz <mreitz@redhat.com>
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      7061a078
  10. 06 12月, 2016 1 次提交
  11. 25 11月, 2016 1 次提交
  12. 22 11月, 2016 2 次提交
    • E
      block: Return -ENOTSUP rather than assert on unaligned discards · 49228d1e
      Eric Blake 提交于
      Right now, the block layer rounds discard requests, so that
      individual drivers are able to assert that discard requests
      will never be unaligned.  But there are some ISCSI devices
      that track and coalesce multiple unaligned requests, turning it
      into an actual discard if the requests eventually cover an
      entire page, which implies that it is better to always pass
      discard requests as low down the stack as possible.
      
      In isolation, this patch has no semantic effect, since the
      block layer currently never passes an unaligned request through.
      But the block layer already has code that silently ignores
      drivers that return -ENOTSUP for a discard request that cannot
      be honored (as well as drivers that return 0 even when nothing
      was done).  But the next patch will update the block layer to
      fragment discard requests, so that clients are guaranteed that
      they are either dealing with an unaligned head or tail, or an
      aligned core, making it similar to the block layer semantics of
      write zero fragmentation.
      
      CC: qemu-stable@nongnu.org
      Signed-off-by: NEric Blake <eblake@redhat.com>
      Reviewed-by: NMax Reitz <mreitz@redhat.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      49228d1e
    • E
      qcow2: Inform block layer about discard boundaries · ecdbead6
      Eric Blake 提交于
      At the qcow2 layer, discard is only possible on a per-cluster
      basis; at the moment, qcow2 silently rounds any unaligned
      requests to this granularity.  However, an upcoming patch will
      fix a regression in the block layer ignoring too much of an
      unaligned discard request, by changing the block layer to
      break up a discard request at alignment boundaries; for that
      to work, the block layer must know about our limits.
      
      However, we can't go one step further by changing
      qcow2_discard_clusters() to assert that requests are always
      aligned, since that helper function is reached on paths
      outside of the block layer.
      
      CC: qemu-stable@nongnu.org
      Signed-off-by: NEric Blake <eblake@redhat.com>
      Reviewed-by: NMax Reitz <mreitz@redhat.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      ecdbead6
  13. 24 10月, 2016 1 次提交
  14. 19 10月, 2016 1 次提交
  15. 13 9月, 2016 1 次提交
    • S
      qcow2: avoid memcpy(dst, NULL, len) · 0647d47c
      Stefan Hajnoczi 提交于
      Section "7.1.4 Use of library functions" in the C99 standard says:
      
        If an argument to a function has an invalid value (such as [...]
        a null pointer [...]) [...] the behavior is undefined.
      
      Additionally the "searching and sorting" functions are specified as
      requiring valid pointer values as described in 7.1.4.
      
      This patch fixes the following sanitizer errors:
      
        block/qcow2.c:1807:41: runtime error: null pointer passed as argument 2, which is declared to never be null
        block/qcow2-cluster.c:86:26: runtime error: null pointer passed as argument 2, which is declared to never be null
      Reported-by: NPeter Maydell <peter.maydell@linaro.org>
      Reviewed-by: NKevin Wolf <kwolf@redhat.com>
      Message-id: 1473758138-19260-1-git-send-email-stefanha@redhat.com
      Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
      0647d47c
  16. 06 9月, 2016 3 次提交
  17. 26 7月, 2016 1 次提交
  18. 20 7月, 2016 1 次提交
  19. 13 7月, 2016 1 次提交
    • P
      coroutine: move entry argument to qemu_coroutine_create · 0b8b8753
      Paolo Bonzini 提交于
      In practice the entry argument is always known at creation time, and
      it is confusing that sometimes qemu_coroutine_enter is used with a
      non-NULL argument to re-enter a coroutine (this happens in
      block/sheepdog.c and tests/test-coroutine.c).  So pass the opaque value
      at creation time, for consistency with e.g. aio_bh_new.
      
      Mostly done with the following semantic patch:
      
      @ entry1 @
      expression entry, arg, co;
      @@
      - co = qemu_coroutine_create(entry);
      + co = qemu_coroutine_create(entry, arg);
        ...
      - qemu_coroutine_enter(co, arg);
      + qemu_coroutine_enter(co);
      
      @ entry2 @
      expression entry, arg;
      identifier co;
      @@
      - Coroutine *co = qemu_coroutine_create(entry);
      + Coroutine *co = qemu_coroutine_create(entry, arg);
        ...
      - qemu_coroutine_enter(co, arg);
      + qemu_coroutine_enter(co);
      
      @ entry3 @
      expression entry, arg;
      @@
      - qemu_coroutine_enter(qemu_coroutine_create(entry), arg);
      + qemu_coroutine_enter(qemu_coroutine_create(entry, arg));
      
      @ reentry @
      expression co;
      @@
      - qemu_coroutine_enter(co, NULL);
      + qemu_coroutine_enter(co);
      
      except for the aforementioned few places where the semantic patch
      stumbled (as expected) and for test_co_queue, which would otherwise
      produce an uninitialized variable warning.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Reviewed-by: NFam Zheng <famz@redhat.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      0b8b8753
  20. 05 7月, 2016 4 次提交