1. 12 5月, 2017 6 次提交
    • S
      Merge tag 'block-pull-request' into staging · b54933ee
      Stefan Hajnoczi 提交于
      # gpg: Signature made Fri 12 May 2017 10:37:12 AM EDT
      # gpg:                using RSA key 0x9CA4ABB381AB73C8
      # gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>"
      # gpg:                 aka "Stefan Hajnoczi <stefanha@gmail.com>"
      # Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35  775A 9CA4 ABB3 81AB 73C8
      
      * tag 'block-pull-request':
        aio: add missing aio_notify() to aio_enable_external()
        block: Simplify BDRV_BLOCK_RAW recursion
        coroutine: remove GThread implementation
      Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
      b54933ee
    • S
      Merge remote-tracking branch 'kwolf/tags/for-upstream' into staging · 3753e255
      Stefan Hajnoczi 提交于
      Block layer patches
      
      # gpg: Signature made Thu 11 May 2017 10:31:37 AM EDT
      # gpg:                using RSA key 0x7F09B272C88F2FD6
      # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>"
      # Primary key fingerprint: DC3D EB15 9A9A F95D 3D74  56FE 7F09 B272 C88F 2FD6
      
      * kwolf/tags/for-upstream: (58 commits)
        MAINTAINERS: Add qemu-progress to the block layer
        qcow2: Discard/zero clusters by byte count
        qcow2: Assert that cluster operations are aligned
        qcow2: Optimize write zero of unaligned tail cluster
        iotests: Add test 179 to cover write zeroes with unmap
        iotests: Improve _filter_qemu_img_map
        qcow2: Optimize zero_single_l2() to minimize L2 churn
        qcow2: Make distinction between zero cluster types obvious
        qcow2: Name typedef for cluster type
        qcow2: Correctly report status of preallocated zero clusters
        block: Update comments on BDRV_BLOCK_* meanings
        qcow2: Use consistent switch indentation
        qcow2: Nicer variable names in qcow2_update_snapshot_refcount()
        tests: Add coverage for recent block geometry fixes
        blkdebug: Add ability to override unmap geometries
        blkdebug: Simplify override logic
        blkdebug: Add pass-through write_zero and discard support
        blkdebug: Refactor error injection
        blkdebug: Sanity check block layer guarantees
        qemu-io: Switch 'map' output to byte-based reporting
        ...
      Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
      3753e255
    • S
      aio: add missing aio_notify() to aio_enable_external() · 321d1dba
      Stefan Hajnoczi 提交于
      The main loop uses aio_disable_external()/aio_enable_external() to
      temporarily disable processing of external AioContext clients like
      device emulation.
      
      This allows monitor commands to quiesce I/O and prevent the guest from
      submitting new requests while a monitor command is in progress.
      
      The aio_enable_external() API is currently broken when an IOThread is in
      aio_poll() waiting for fd activity when the main loop re-enables
      external clients.  Incrementing ctx->external_disable_cnt does not wake
      the IOThread from ppoll(2) so fd processing remains suspended and leads
      to unresponsive emulated devices.
      
      This patch adds an aio_notify() call to aio_enable_external() so the
      IOThread is kicked out of ppoll(2) and will re-arm the file descriptors.
      
      The bug can be reproduced as follows:
      
        $ qemu -M accel=kvm -m 1024 \
               -object iothread,id=iothread0 \
               -device virtio-scsi-pci,iothread=iothread0,id=virtio-scsi-pci0 \
               -drive if=none,id=drive0,aio=native,cache=none,format=raw,file=test.img \
               -device scsi-hd,id=scsi-hd0,drive=drive0 \
               -qmp tcp::5555,server,nowait
      
        $ scripts/qmp/qmp-shell localhost:5555
        (qemu) blockdev-snapshot-sync device=drive0 snapshot-file=sn1.qcow2
               mode=absolute-paths format=qcow2
      
      After blockdev-snapshot-sync completes the SCSI disk will be
      unresponsive.  This leads to request timeouts inside the guest.
      Reported-by: NQianqian Zhu <qizhu@redhat.com>
      Reviewed-by: NFam Zheng <famz@redhat.com>
      Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
      Message-id: 20170508180705.20609-1-stefanha@redhat.com
      Suggested-by: NFam Zheng <famz@redhat.com>
      Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
      321d1dba
    • E
      block: Simplify BDRV_BLOCK_RAW recursion · ee29d6ad
      Eric Blake 提交于
      Since we are already in coroutine context during the body of
      bdrv_co_get_block_status(), we can shave off a few layers of
      wrappers when recursing to query the protocol when a format driver
      returned BDRV_BLOCK_RAW.
      
      Note that we are already using the correct recursion later on in
      the same function, when probing whether the protocol layer is sparse
      in order to find out if we can add BDRV_BLOCK_ZERO to an existing
      BDRV_BLOCK_DATA|BDRV_BLOCK_OFFSET_VALID.
      Signed-off-by: NEric Blake <eblake@redhat.com>
      Reviewed-by: NMax Reitz <mreitz@redhat.com>
      Reviewed-by: NFam Zheng <famz@redhat.com>
      Message-id: 20170504173745.27414-1-eblake@redhat.com
      Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
      ee29d6ad
    • D
      coroutine: remove GThread implementation · 33c53c54
      Daniel P. Berrange 提交于
      The GThread implementation is not functional enough to actually
      run QEMU reliably. While it was potentially useful for debugging,
      we have a scripts/qemugdb/coroutine.py to enable tracing of
      ucontext coroutines in GDB, so that removes the only reason for
      GThread to exist.
      Signed-off-by: NDaniel P. Berrange <berrange@redhat.com>
      Acked-by: NAlex Bennée <alex.bennee@linaro.org>
      Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
      33c53c54
    • L
      maintainers: Add myself as linux-user reviewer · ecc1f5ad
      Laurent Vivier 提交于
      I volunteer to review linux-user patches.
      Adding myself will help to not miss some of them.
      Signed-off-by: NLaurent Vivier <laurent@vivier.eu>
      Acked-by: NRiku Voipio <riku.voipio@linaro.org>
      Message-id: 20170510153950.29343-1-laurent@vivier.eu
      Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
      ecc1f5ad
  2. 11 5月, 2017 34 次提交
    • K
      Merge remote-tracking branch 'mreitz/tags/pull-block-2017-05-11' into queue-block · d541e201
      Kevin Wolf 提交于
      Block patches for the block queue.
      
      # gpg: Signature made Thu May 11 14:28:41 2017 CEST
      # gpg:                using RSA key 0xF407DB0061D5CF40
      # gpg: Good signature from "Max Reitz <mreitz@redhat.com>"
      # Primary key fingerprint: 91BE B60A 30DB 3E88 57D1  1829 F407 DB00 61D5 CF40
      
      * mreitz/tags/pull-block-2017-05-11: (22 commits)
        MAINTAINERS: Add qemu-progress to the block layer
        qcow2: Discard/zero clusters by byte count
        qcow2: Assert that cluster operations are aligned
        qcow2: Optimize write zero of unaligned tail cluster
        iotests: Add test 179 to cover write zeroes with unmap
        iotests: Improve _filter_qemu_img_map
        qcow2: Optimize zero_single_l2() to minimize L2 churn
        qcow2: Make distinction between zero cluster types obvious
        qcow2: Name typedef for cluster type
        qcow2: Correctly report status of preallocated zero clusters
        block: Update comments on BDRV_BLOCK_* meanings
        qcow2: Use consistent switch indentation
        qcow2: Nicer variable names in qcow2_update_snapshot_refcount()
        tests: Add coverage for recent block geometry fixes
        blkdebug: Add ability to override unmap geometries
        blkdebug: Simplify override logic
        blkdebug: Add pass-through write_zero and discard support
        blkdebug: Refactor error injection
        blkdebug: Sanity check block layer guarantees
        qemu-io: Switch 'map' output to byte-based reporting
        ...
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      d541e201
    • M
      MAINTAINERS: Add qemu-progress to the block layer · 8dd30c86
      Max Reitz 提交于
      util/qemu-progress.c is currently unmaintained. The only user of its
      functionality is qemu-img, so it effectively is part of the block layer.
      Suggested-by: NFam Zheng <famz@redhat.com>
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      Message-id: 20170428165517.30341-1-mreitz@redhat.com
      Reviewed-by: NEric Blake <eblake@redhat.com>
      Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      8dd30c86
    • E
      qcow2: Discard/zero clusters by byte count · d2cb36af
      Eric Blake 提交于
      Passing a byte offset, but sector count, when we ultimately
      want to operate on cluster granularity, is madness.  Clean up
      the external interfaces to take both offset and count as bytes,
      while still keeping the assertion added previously that the
      caller must align the values to a cluster.  Then rename things
      to make sure backports don't get confused by changed units:
      instead of qcow2_discard_clusters() and qcow2_zero_clusters(),
      we now have qcow2_cluster_discard() and qcow2_cluster_zeroize().
      
      The internal functions still operate on clusters at a time, and
      return an int for number of cleared clusters; but on an image
      with 2M clusters, a single L2 table holds 256k entries that each
      represent a 2M cluster, totalling well over INT_MAX bytes if we
      ever had a request for that many bytes at once.  All our callers
      currently limit themselves to 32-bit bytes (and therefore fewer
      clusters), but by making this function 64-bit clean, we have one
      less place to clean up if we later improve the block layer to
      support 64-bit bytes through all operations (with the block layer
      auto-fragmenting on behalf of more-limited drivers), rather than
      the current state where some interfaces are artificially limited
      to INT_MAX at a time.
      Signed-off-by: NEric Blake <eblake@redhat.com>
      Reviewed-by: NMax Reitz <mreitz@redhat.com>
      Message-id: 20170507000552.20847-13-eblake@redhat.com
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      d2cb36af
    • E
      qcow2: Assert that cluster operations are aligned · f10ee139
      Eric Blake 提交于
      We already audited (in commit 0c1bd469) that qcow2_discard_clusters()
      is only passed cluster-aligned start values; but we can further
      tighten the assertion that the only unaligned end value is at EOF.
      
      Recent commits have taken advantage of an unaligned tail cluster,
      for both discard and write zeroes.
      Signed-off-by: NEric Blake <eblake@redhat.com>
      Reviewed-by: NMax Reitz <mreitz@redhat.com>
      Message-id: 20170507000552.20847-12-eblake@redhat.com
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      f10ee139
    • E
      qcow2: Optimize write zero of unaligned tail cluster · fbaa6bb3
      Eric Blake 提交于
      We've already improved discards to operate efficiently on the tail
      of an unaligned qcow2 image; it's time to make a similar improvement
      to write zeroes.  The special case is only valid at the tail
      cluster of a file, where we must recognize that any sectors beyond
      the image end would implicitly read as zero, and therefore should
      not penalize our logic for widening a partial cluster into writing
      the whole cluster as zero.
      
      However, note that for now, the special case of end-of-file is only
      recognized if there is no backing file, or if the backing file has
      the same length; that's because when the backing file is shorter
      than the active layer, we don't have code in place to recognize
      that reads of a sector unallocated at the top and beyond the backing
      end-of-file are implicitly zero.  It's not much of a real loss,
      because most people don't use images that aren't cluster-aligned,
      or where the active layer is a different size than the backing
      layer (especially where the difference falls within a single cluster).
      
      Update test 154 to cover the new scenarios, using two images of
      intentionally differing length.
      
      While at it, fix the test to gracefully skip when run as
      ./check -qcow2 -o compat=0.10 154
      since the older format lacks zero clusters already required earlier
      in the test.
      Signed-off-by: NEric Blake <eblake@redhat.com>
      Reviewed-by: NMax Reitz <mreitz@redhat.com>
      Message-id: 20170507000552.20847-11-eblake@redhat.com
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      fbaa6bb3
    • E
      iotests: Add test 179 to cover write zeroes with unmap · e249d519
      Eric Blake 提交于
      No tests were covering write zeroes with unmap.  Additionally,
      I needed to prove that my previous patches for correct status
      reporting and write zeroes optimizations actually had an impact.
      
      The test works for cluster_size between 8k and 2M (for smaller
      sizes, it fails because our allocation patterns are not contiguous
      with small clusters - in part, the largest consecutive allocation
      we tend to get is often bounded by the size covered by one L2
      table).
      
      Note that testing for zero clusters is tricky: 'qemu-io map'
      reports whether data comes from the current layer of the image
      (useful for sniffing out which regions of the file have
      QCOW_OFLAG_ZERO) - but doesn't show which clusters have mappings;
      while 'qemu-img map' sees "zero":true for both unallocated and
      zero clusters for any qcow2 with no backing layer (so less useful
      at detecting true zero clusters), but reliably shows mappings.
      So we have to rely on both queries side-by-side at each point of
      the test.
      Signed-off-by: NEric Blake <eblake@redhat.com>
      Reviewed-by: NMax Reitz <mreitz@redhat.com>
      Message-id: 20170507000552.20847-10-eblake@redhat.com
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      e249d519
    • E
      iotests: Improve _filter_qemu_img_map · d9ca2214
      Eric Blake 提交于
      Although _filter_qemu_img_map documents that it scrubs offsets, it
      was only doing so for human mode.  Of the existing tests using the
      filter (97, 122, 150, 154, 176), two of them are affected, but it
      does not hurt the validity of the tests to not require particular
      mappings (another test, 66, uses offsets but intentionally does not
      pass through _filter_qemu_img_map, because it checks that offsets
      are unchanged before and after an operation).
      
      Another justification for this patch is that it will allow a future
      patch to utilize 'qemu-img map --output=json' to check the status of
      preallocated zero clusters without regards to the mapping (since
      the qcow2 mapping can be very sensitive to the chosen cluster size,
      when preallocation is not in use).
      Signed-off-by: NEric Blake <eblake@redhat.com>
      Reviewed-by: NMax Reitz <mreitz@redhat.com>
      Message-id: 20170507000552.20847-9-eblake@redhat.com
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      d9ca2214
    • E
      qcow2: Optimize zero_single_l2() to minimize L2 churn · 06cc5e2b
      Eric Blake 提交于
      Similar to discard_single_l2(), we should try to avoid dirtying
      the L2 cache when the cluster we are changing already has the
      right characteristics.
      
      Note that by the time we get to zero_single_l2(), BDRV_REQ_MAY_UNMAP
      is a requirement to unallocate a cluster (this is because the block
      layer clears that flag if discard.* flags during open requested that
      we never punch holes - see the conversation around commit 170f4b2e,
      https://lists.gnu.org/archive/html/qemu-devel/2016-09/msg07306.html).
      Therefore, this patch can only reuse a zero cluster as-is if either
      unmapping is not requested, or if the zero cluster was not associated
      with an allocation.
      
      Technically, there are some cases where an unallocated cluster
      already reads as all zeroes (namely, when there is no backing file
      [easy: check bs->backing], or when the backing file also reads as
      zeroes [harder: we can't check bdrv_get_block_status since we are
      already holding the lock]), where the guest would not immediately see
      a difference if we left that cluster unallocated.  But if the user
      did not request unmapping, leaving an unallocated cluster is wrong;
      and even if the user DID request unmapping, keeping a cluster
      unallocated risks a subtle semantic change of guest-visible contents
      if a backing file is later added, and it is not worth auditing
      whether all internal uses such as mirror properly avoid an unmap
      request.  Thus, this patch is intentionally limited to just clusters
      that are already marked as zero.
      Signed-off-by: NEric Blake <eblake@redhat.com>
      Reviewed-by: NMax Reitz <mreitz@redhat.com>
      Message-id: 20170507000552.20847-8-eblake@redhat.com
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      06cc5e2b
    • E
      qcow2: Make distinction between zero cluster types obvious · fdfab37d
      Eric Blake 提交于
      Treat plain zero clusters differently from allocated ones, so that
      we can simplify the logic of checking whether an offset is present.
      Do this by splitting QCOW2_CLUSTER_ZERO into two new enums,
      QCOW2_CLUSTER_ZERO_PLAIN and QCOW2_CLUSTER_ZERO_ALLOC.
      
      I tried to arrange the enum so that we could use
      'ret <= QCOW2_CLUSTER_ZERO_PLAIN' for all unallocated types, and
      'ret >= QCOW2_CLUSTER_ZERO_ALLOC' for allocated types, although
      I didn't actually end up taking advantage of the layout.
      
      In many cases, this leads to simpler code, by properly combining
      cases (sometimes, both zero types pair together, other times,
      plain zero is more like unallocated while allocated zero is more
      like normal).
      Signed-off-by: NEric Blake <eblake@redhat.com>
      Message-id: 20170507000552.20847-7-eblake@redhat.com
      Reviewed-by: NMax Reitz <mreitz@redhat.com>
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      fdfab37d
    • E
      qcow2: Name typedef for cluster type · 3ef95218
      Eric Blake 提交于
      Although it doesn't add all that much type safety (this is C, after
      all), it does add a bit of legibility to use the name QCow2ClusterType
      instead of a plain int.
      
      In particular, qcow2_get_cluster_offset() has an overloaded return
      type; a QCow2ClusterType on success, and -errno on failure; keeping
      the cluster type in a separate variable makes it slightly easier for
      the next patch to make further computations based on the type.
      Suggested-by: NMax Reitz <mreitz@redhat.com>
      Signed-off-by: NEric Blake <eblake@redhat.com>
      Message-id: 20170507000552.20847-6-eblake@redhat.com
      [mreitz: Use the new type in two more places (one of them pulled from
               the next patch)]
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      3ef95218
    • E
      qcow2: Correctly report status of preallocated zero clusters · 4341df8a
      Eric Blake 提交于
      We were throwing away the preallocation information associated with
      zero clusters.  But we should be matching the well-defined semantics
      in bdrv_get_block_status(), where (BDRV_BLOCK_ZERO |
      BDRV_BLOCK_OFFSET_VALID) informs the user which offset is reserved,
      while still reminding the user that reading from that offset is
      likely to read garbage.
      
      count_contiguous_clusters_by_type() is now used only for unallocated
      cluster runs, hence it gets renamed and tightened.
      
      Making this change lets us see which portions of an image are zero
      but preallocated, when using qemu-img map --output=json.  The
      --output=human side intentionally ignores all zero clusters, whether
      or not they are preallocated.
      
      The fact that there is no change to qemu-iotests './check -qcow2'
      merely means that we aren't yet testing this aspect of qemu-img;
      a later patch will add a test.
      Signed-off-by: NEric Blake <eblake@redhat.com>
      Reviewed-by: NMax Reitz <mreitz@redhat.com>
      Message-id: 20170507000552.20847-5-eblake@redhat.com
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      4341df8a
    • E
      block: Update comments on BDRV_BLOCK_* meanings · 4c41cb49
      Eric Blake 提交于
      We had some conflicting documentation: a nice 8-way table that
      described all possible combinations of DATA, ZERO, and
      OFFSET_VALID, contrasted with text that implied that OFFSET_VALID
      always meant raw data could be read directly.  Furthermore, the
      text refers a lot to bs->file, even though the interface was
      updated back in 67a0fd2a to let the driver pass back a specific
      BDS (not necessarily bs->file).  As the 8-way table is the
      intended semantics, simplify the rest of the text to get rid of
      the confusion.
      
      ALLOCATED is always set by the block layer for convenience (drivers
      do not have to worry about it).  RAW is used only internally, but
      by more than the raw driver.  Document these additional items on
      the driver callback.
      Suggested-by: NMax Reitz <mreitz@redhat.com>
      Signed-off-by: NEric Blake <eblake@redhat.com>
      Reviewed-by: NMax Reitz <mreitz@redhat.com>
      Message-id: 20170507000552.20847-4-eblake@redhat.com
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      4c41cb49
    • E
      qcow2: Use consistent switch indentation · bbd995d8
      Eric Blake 提交于
      Fix a couple of inconsistent indentations, before an upcoming
      patch further tweaks the switch statements.
      (best viewed with 'git diff -b').
      Signed-off-by: NEric Blake <eblake@redhat.com>
      Reviewed-by: NMax Reitz <mreitz@redhat.com>
      Message-id: 20170507000552.20847-3-eblake@redhat.com
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      bbd995d8
    • E
      qcow2: Nicer variable names in qcow2_update_snapshot_refcount() · b32cbae1
      Eric Blake 提交于
      In order to keep checkpatch happy when the next patch changes
      indentation, we first have to shorten some long lines.  The easiest
      approach is to use a new variable in place of
      'offset & L2E_OFFSET_MASK', except that 'offset' is the best name
      for that variable.  Change '[old_]offset' to '[old_]entry' to
      make room.
      
      While touching things, also fix checkpatch warnings about unusual
      'for' statements.
      
      Suggested by Max Reitz <mreitz@redhat.com>
      Signed-off-by: NEric Blake <eblake@redhat.com>
      Message-id: 20170507000552.20847-2-eblake@redhat.com
      Reviewed-by: NMax Reitz <mreitz@redhat.com>
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      b32cbae1
    • E
      tests: Add coverage for recent block geometry fixes · 40812d93
      Eric Blake 提交于
      Use blkdebug's new geometry constraints to emulate setups that
      have needed past regression fixes: write zeroes asserting
      when running through a loopback block device with max-transfer
      smaller than cluster size, and discard rounding away portions
      of requests not aligned to preferred boundaries.  Also, add
      coverage that the block layer is honoring max transfer limits.
      
      For now, a single iotest performs all actions, with the idea
      that we can add future blkdebug constraint test cases in the
      same file; but it can be split into multiple iotests if we find
      reason to run one portion of the test in more setups than what
      are possible in the other.
      
      For reference, the final portion of the test (checking whether
      discard passes as much as possible to the lowest layers of the
      stack) works as follows:
      
      qemu-io: discard 30M at 80000001, passed to blkdebug
        blkdebug: discard 511 bytes at 80000001, -ENOTSUP (smaller than
      blkdebug's 512 align)
        blkdebug: discard 14371328 bytes at 80000512, passed to qcow2
          qcow2: discard 739840 bytes at 80000512, -ENOTSUP (smaller than
      qcow2's 1M align)
          qcow2: discard 13M bytes at 77M, succeeds
        blkdebug: discard 15M bytes at 90M, passed to qcow2
          qcow2: discard 15M bytes at 90M, succeeds
        blkdebug: discard 1356800 bytes at 105M, passed to qcow2
          qcow2: discard 1M at 105M, succeeds
          qcow2: discard 308224 bytes at 106M, -ENOTSUP (smaller than qcow2's
      1M align)
        blkdebug: discard 1 byte at 111457280, -ENOTSUP (smaller than
      blkdebug's 512 align)
      Signed-off-by: NEric Blake <eblake@redhat.com>
      Reviewed-by: NMax Reitz <mreitz@redhat.com>
      Message-id: 20170429191419.30051-10-eblake@redhat.com
      [mreitz: For cooperation with image locking, add -r to the qemu-io
               invocation which verifies the image content]
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      40812d93
    • E
      blkdebug: Add ability to override unmap geometries · 430b26a8
      Eric Blake 提交于
      Make it easier to simulate various unusual hardware setups (for
      example, recent commits 3482b9bc and b8d0a980 affect the Dell
      Equallogic iSCSI with its 15M preferred and maximum unmap and
      write zero sizing, or b2f95fee deals with the Linux loopback
      block device having a max_transfer of 64k), by allowing blkdebug
      to wrap any other device with further restrictions on various
      alignments.
      Signed-off-by: NEric Blake <eblake@redhat.com>
      Reviewed-by: NMax Reitz <mreitz@redhat.com>
      Message-id: 20170429191419.30051-9-eblake@redhat.com
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      430b26a8
    • E
      blkdebug: Simplify override logic · 3dc834f8
      Eric Blake 提交于
      Rather than store into a local variable, then copy to the struct
      if the value is valid, then reporting errors otherwise, it is
      simpler to just store into the struct and report errors if the
      value is invalid.  This however requires that the struct store
      a 64-bit number, rather than a narrower type.  Likewise, setting
      a sane errno value in ret prior to the sequence of parsing and
      jumping to out: on error makes it easier for the next patch to
      add a chain of similar checks.
      Signed-off-by: NEric Blake <eblake@redhat.com>
      Message-id: 20170429191419.30051-8-eblake@redhat.com
      Reviewed-by: NMax Reitz <mreitz@redhat.com>
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      3dc834f8
    • E
      blkdebug: Add pass-through write_zero and discard support · 63188c24
      Eric Blake 提交于
      In order to test the effects of artificial geometry constraints
      on operations like write zero or discard, we first need blkdebug
      to manage these actions.  It also allows us to inject errors on
      those operations, just like we can for read/write/flush.
      
      We can also test the contract promised by the block layer; namely,
      if a device has specified limits on alignment or maximum size,
      then those limits must be obeyed (for now, the blkdebug driver
      merely inherits limits from whatever it is wrapping, but the next
      patch will further enhance it to allow specific limit overrides).
      
      This patch intentionally refuses to service requests smaller than
      the requested alignments; this is because an upcoming patch adds
      a qemu-iotest to prove that the block layer is correctly handling
      fragmentation, but the test only works if there is a way to tell
      the difference at artificial alignment boundaries when blkdebug is
      using a larger-than-default alignment.  If we let the blkdebug
      layer always defer to the underlying layer, which potentially has
      a smaller granularity, the iotest will be thwarted.
      
      Tested by setting up an NBD server with export 'foo', then invoking:
      $ ./qemu-io
      qemu-io> open -o driver=blkdebug blkdebug::nbd://localhost:10809/foo
      qemu-io> d 0 15M
      qemu-io> w -z 0 15M
      
      Pre-patch, the server never sees the discard (it was silently
      eaten by the block layer); post-patch it is passed across the
      wire.  Likewise, pre-patch the write is always passed with
      NBD_WRITE (with 15M of zeroes on the wire), while post-patch
      it can utilize NBD_WRITE_ZEROES (for less traffic).
      Signed-off-by: NEric Blake <eblake@redhat.com>
      Reviewed-by: NMax Reitz <mreitz@redhat.com>
      Message-id: 20170429191419.30051-7-eblake@redhat.com
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      63188c24
    • E
      blkdebug: Refactor error injection · d157ed5f
      Eric Blake 提交于
      Rather than repeat the logic at each caller of checking if a Rule
      exists that warrants an error injection, fold that logic into
      inject_error(); and rename it to rule_check() for legibility.
      This will help the next patch, which adds two more callers that
      need to check rules for the potential of injecting errors.
      Signed-off-by: NEric Blake <eblake@redhat.com>
      Reviewed-by: NMax Reitz <mreitz@redhat.com>
      Message-id: 20170429191419.30051-6-eblake@redhat.com
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      d157ed5f
    • E
      blkdebug: Sanity check block layer guarantees · e0ef4395
      Eric Blake 提交于
      Commits 04ed95f4 and 1a62d0ac updated the block layer to auto-fragment
      any I/O to fit within device boundaries. Additionally, when using a
      minimum alignment of 4k, we want to ensure the block layer does proper
      read-modify-write rather than requesting I/O on a slice of a sector.
      Let's enforce that the contract is obeyed when using blkdebug.  For
      now, blkdebug only allows alignment overrides, and just inherits other
      limits from whatever device it is wrapping, but a future patch will
      further enhance things.
      Signed-off-by: NEric Blake <eblake@redhat.com>
      Reviewed-by: NKevin Wolf <kwolf@redhat.com>
      Reviewed-by: NMax Reitz <mreitz@redhat.com>
      Message-id: 20170429191419.30051-5-eblake@redhat.com
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      e0ef4395
    • E
      qemu-io: Switch 'map' output to byte-based reporting · 6f3c90af
      Eric Blake 提交于
      Mixing byte offset and sector allocation counts is a bit
      confusing.  Also, reporting n/m sectors, where m decreases
      according to the remaining size of the file, isn't really
      adding any useful information; and reporting an offset at
      both the front and end of the line, with large amounts of
      whitespace, is pointless.  Update the output to use byte
      counts and shorter lines, then adjust the affected tests
      (./check -qcow2 102, ./check -vpc 146).
      
      Note that 'qemu-io map' is MUCH weaker than 'qemu-img map';
      the former only shows which regions of the active layer are
      allocated, without regards to where the allocation comes from
      or whether the allocated portion is known to read as zero
      (because it is using the weaker bdrv_is_allocated()); while the
      latter (especially in --output=json mode) reports more details
      from bdrv_get_block_status().
      Signed-off-by: NEric Blake <eblake@redhat.com>
      Message-id: 20170429191419.30051-4-eblake@redhat.com
      Reviewed-by: NMax Reitz <mreitz@redhat.com>
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      6f3c90af
    • E
      qemu-io: Switch 'alloc' command to byte-based length · 4401fdc7
      Eric Blake 提交于
      For the 'alloc' command, accepting an offset in bytes but a length
      in sectors, and reporting output in sectors, is confusing.  Do
      everything in bytes, and adjust the expected output accordingly.
      Signed-off-by: NEric Blake <eblake@redhat.com>
      Message-id: 20170429191419.30051-3-eblake@redhat.com
      Reviewed-by: NMax Reitz <mreitz@redhat.com>
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      4401fdc7
    • E
      qemu-io: Improve alignment checks · 1bce6b4c
      Eric Blake 提交于
      Several copy-and-pasted alignment checks exist in qemu-io, which
      could use some minor improvements:
      
      - Manual comparison against 0x1ff is not as clean as using our
      alignment macros (QEMU_IS_ALIGNED) from osdep.h.
      
      - The error messages aren't quite grammatically correct.
      Suggested-by: NPhilippe Mathieu-Daudé <f4bug@amsat.org>
      Suggested-by: NMax Reitz <mreitz@redhat.com>
      Signed-off-by: NEric Blake <eblake@redhat.com>
      Message-id: 20170429191419.30051-2-eblake@redhat.com
      Reviewed-by: NPhilippe Mathieu-Daudé <f4bug@amsat.org>
      Reviewed-by: NMax Reitz <mreitz@redhat.com>
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      1bce6b4c
    • J
      blockdev: use drained_begin/end for qmp_block_resize · 698bdfa0
      John Snow 提交于
      Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1447551
      
      If one tries to issue a block_resize while a guest is busy
      accessing the disk, it is possible that qemu may deadlock
      when invoking aio_poll from both the main loop and the iothread.
      
      Replace another instance of bdrv_drain_all that doesn't
      quite belong.
      
      Cc: qemu-stable@nongnu.org
      Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NJohn Snow <jsnow@redhat.com>
      Reviewed-by: NEric Blake <eblake@redhat.com>
      Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      698bdfa0
    • C
      nvme: Implement Write Zeroes · c03e7ef1
      Christoph Hellwig 提交于
      Signed-off-by: NKeith Busch <keith.busch@intel.com>
      [hch: ported over from qemu-nvme.git to mainline]
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Acked-by: NKeith Busch <keith.busch@intel.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      c03e7ef1
    • A
      qemu-img: wait for convert coroutines to complete · b91127ed
      Anton Nefedov 提交于
      On error path (like i/o error in one of the coroutines), it's required to
        - wait for coroutines completion before cleaning the common structures
        - reenter dependent coroutines so they ever finish
      
      Introduced in 2d9187bc.
      
      Cc: qemu-stable@nongnu.org
      Signed-off-by: NAnton Nefedov <anton.nefedov@virtuozzo.com>
      Reviewed-by: NPeter Lieven <pl@kamp.de>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      b91127ed
    • K
      file-posix: Remove .bdrv_inactivate/invalidate_cache · 22d5cd82
      Kevin Wolf 提交于
      Now that the block layer takes care to request a lot less permissions
      for inactive nodes, the special-casing in file-posix isn't necessary any
      more.
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      Reviewed-by: NEric Blake <eblake@redhat.com>
      22d5cd82
    • K
      block: Fix write/resize permissions for inactive images · 9c5e6594
      Kevin Wolf 提交于
      Format drivers for inactive nodes don't need write/resize permissions on
      their bs->file and can share write/resize with another VM (in fact, this
      is the whole point of keeping images inactive). Represent this fact in
      the op blocker system, so that image locking does the right thing
      without special-casing inactive images.
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      Reviewed-by: NEric Blake <eblake@redhat.com>
      9c5e6594
    • K
      block: Inactivate parents before children · 38701b6a
      Kevin Wolf 提交于
      The proper order for inactivating block nodes is that first the parents
      get inactivated and then the children. If we do things in this order, we
      can assert that we didn't accidentally leave a parent activated when one
      of its child nodes is inactive.
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      Reviewed-by: NEric Blake <eblake@redhat.com>
      38701b6a
    • K
      block: Drop permissions when migration completes · cfa1a572
      Kevin Wolf 提交于
      With image locking, permissions affect other qemu processes as well. We
      want to be sure that the destination can run, so let's drop permissions
      on the source when migration completes.
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      Reviewed-by: NEric Blake <eblake@redhat.com>
      cfa1a572
    • K
      block: New BdrvChildRole.activate() for blk_resume_after_migration() · 4417ab7a
      Kevin Wolf 提交于
      Instead of manually calling blk_resume_after_migration() in migration
      code after doing bdrv_invalidate_cache_all(), integrate the BlockBackend
      activation with cache invalidation into a single function. This is
      achieved with a new callback in BdrvChildRole that is called by
      bdrv_invalidate_cache_all().
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      Reviewed-by: NEric Blake <eblake@redhat.com>
      4417ab7a
    • K
      migration: Unify block node activation error handling · ace21a58
      Kevin Wolf 提交于
      Migration code activates all block driver nodes on the destination when
      the migration completes. It does so by calling
      bdrv_invalidate_cache_all() and blk_resume_after_migration(). There is
      one code path for precopy and one for postcopy migration, resulting in
      four function calls, which used to have three different failure modes.
      
      This patch unifies the behaviour so that failure to activate all block
      nodes is non-fatal, but the error message is logged and the VM isn't
      automatically started. 'cont' will retry activating the block nodes.
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      Reviewed-by: NEric Blake <eblake@redhat.com>
      ace21a58
    • M
      iotests: Extend test 066 · aa93c834
      Max Reitz 提交于
      066 was supposed to be a test "for discarding preallocated zero
      clusters", but it did so incompletely: While it did check the image
      file's integrity after the operation, it did not confirm that the
      clusters are indeed freed. This patch adds this test.
      
      In addition, new cases for writing to preallocated zero clusters are
      added.
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      Reviewed-by: NEric Blake <eblake@redhat.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      aa93c834
    • M
      qcow2: Discard preallocated zero clusters · 293073a5
      Max Reitz 提交于
      In discard_single_l2(), we completely discard normal clusters instead of
      simply turning them into preallocated zero clusters. That means we
      should probably do the same with such preallocated zero clusters:
      Discard them instead of keeping them allocated.
      Reported-by: NEric Blake <eblake@redhat.com>
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      Reviewed-by: NEric Blake <eblake@redhat.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      293073a5