提交 · b54933eed532b10c8a1967d9f988262ccbb94ee2 · openeuler / qemu

12 5月, 2017 6 次提交

Merge tag 'block-pull-request' into staging · b54933ee

由 Stefan Hajnoczi 提交于 5月 12, 2017

# gpg: Signature made Fri 12 May 2017 10:37:12 AM EDT
# gpg:                using RSA key 0x9CA4ABB381AB73C8
# gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>"
# gpg:                 aka "Stefan Hajnoczi <stefanha@gmail.com>"
# Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35  775A 9CA4 ABB3 81AB 73C8

* tag 'block-pull-request':
  aio: add missing aio_notify() to aio_enable_external()
  block: Simplify BDRV_BLOCK_RAW recursion
  coroutine: remove GThread implementation
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>

b54933ee

Merge remote-tracking branch 'kwolf/tags/for-upstream' into staging · 3753e255

由 Stefan Hajnoczi 提交于 5月 12, 2017

Block layer patches

# gpg: Signature made Thu 11 May 2017 10:31:37 AM EDT
# gpg:                using RSA key 0x7F09B272C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>"
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74  56FE 7F09 B272 C88F 2FD6

* kwolf/tags/for-upstream: (58 commits)
  MAINTAINERS: Add qemu-progress to the block layer
  qcow2: Discard/zero clusters by byte count
  qcow2: Assert that cluster operations are aligned
  qcow2: Optimize write zero of unaligned tail cluster
  iotests: Add test 179 to cover write zeroes with unmap
  iotests: Improve _filter_qemu_img_map
  qcow2: Optimize zero_single_l2() to minimize L2 churn
  qcow2: Make distinction between zero cluster types obvious
  qcow2: Name typedef for cluster type
  qcow2: Correctly report status of preallocated zero clusters
  block: Update comments on BDRV_BLOCK_* meanings
  qcow2: Use consistent switch indentation
  qcow2: Nicer variable names in qcow2_update_snapshot_refcount()
  tests: Add coverage for recent block geometry fixes
  blkdebug: Add ability to override unmap geometries
  blkdebug: Simplify override logic
  blkdebug: Add pass-through write_zero and discard support
  blkdebug: Refactor error injection
  blkdebug: Sanity check block layer guarantees
  qemu-io: Switch 'map' output to byte-based reporting
  ...
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>

3753e255

aio: add missing aio_notify() to aio_enable_external() · 321d1dba

由 Stefan Hajnoczi 提交于 5月 08, 2017

The main loop uses aio_disable_external()/aio_enable_external() to
temporarily disable processing of external AioContext clients like
device emulation.

This allows monitor commands to quiesce I/O and prevent the guest from
submitting new requests while a monitor command is in progress.

The aio_enable_external() API is currently broken when an IOThread is in
aio_poll() waiting for fd activity when the main loop re-enables
external clients.  Incrementing ctx->external_disable_cnt does not wake
the IOThread from ppoll(2) so fd processing remains suspended and leads
to unresponsive emulated devices.

This patch adds an aio_notify() call to aio_enable_external() so the
IOThread is kicked out of ppoll(2) and will re-arm the file descriptors.

The bug can be reproduced as follows:

  $ qemu -M accel=kvm -m 1024 \
         -object iothread,id=iothread0 \
         -device virtio-scsi-pci,iothread=iothread0,id=virtio-scsi-pci0 \
         -drive if=none,id=drive0,aio=native,cache=none,format=raw,file=test.img \
         -device scsi-hd,id=scsi-hd0,drive=drive0 \
         -qmp tcp::5555,server,nowait

  $ scripts/qmp/qmp-shell localhost:5555
  (qemu) blockdev-snapshot-sync device=drive0 snapshot-file=sn1.qcow2
         mode=absolute-paths format=qcow2

After blockdev-snapshot-sync completes the SCSI disk will be
unresponsive.  This leads to request timeouts inside the guest.
Reported-by: NQianqian Zhu <qizhu@redhat.com>
Reviewed-by: NFam Zheng <famz@redhat.com>
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
Message-id: 20170508180705.20609-1-stefanha@redhat.com
Suggested-by: NFam Zheng <famz@redhat.com>
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>

321d1dba

block: Simplify BDRV_BLOCK_RAW recursion · ee29d6ad

由 Eric Blake 提交于 5月 04, 2017

Since we are already in coroutine context during the body of
bdrv_co_get_block_status(), we can shave off a few layers of
wrappers when recursing to query the protocol when a format driver
returned BDRV_BLOCK_RAW.

Note that we are already using the correct recursion later on in
the same function, when probing whether the protocol layer is sparse
in order to find out if we can add BDRV_BLOCK_ZERO to an existing
BDRV_BLOCK_DATA|BDRV_BLOCK_OFFSET_VALID.
Signed-off-by: NEric Blake <eblake@redhat.com>
Reviewed-by: NMax Reitz <mreitz@redhat.com>
Reviewed-by: NFam Zheng <famz@redhat.com>
Message-id: 20170504173745.27414-1-eblake@redhat.com
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>

ee29d6ad

coroutine: remove GThread implementation · 33c53c54

由 Daniel P. Berrange 提交于 4月 28, 2017

The GThread implementation is not functional enough to actually
run QEMU reliably. While it was potentially useful for debugging,
we have a scripts/qemugdb/coroutine.py to enable tracing of
ucontext coroutines in GDB, so that removes the only reason for
GThread to exist.
Signed-off-by: NDaniel P. Berrange <berrange@redhat.com>
Acked-by: NAlex Bennée <alex.bennee@linaro.org>
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>

33c53c54

maintainers: Add myself as linux-user reviewer · ecc1f5ad

由 Laurent Vivier 提交于 5月 10, 2017

I volunteer to review linux-user patches.
Adding myself will help to not miss some of them.
Signed-off-by: NLaurent Vivier <laurent@vivier.eu>
Acked-by: NRiku Voipio <riku.voipio@linaro.org>
Message-id: 20170510153950.29343-1-laurent@vivier.eu
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>

ecc1f5ad

11 5月, 2017 34 次提交

Merge remote-tracking branch 'mreitz/tags/pull-block-2017-05-11' into queue-block · d541e201

由 Kevin Wolf 提交于 5月 11, 2017

Block patches for the block queue.

# gpg: Signature made Thu May 11 14:28:41 2017 CEST
# gpg:                using RSA key 0xF407DB0061D5CF40
# gpg: Good signature from "Max Reitz <mreitz@redhat.com>"
# Primary key fingerprint: 91BE B60A 30DB 3E88 57D1  1829 F407 DB00 61D5 CF40

* mreitz/tags/pull-block-2017-05-11: (22 commits)
  MAINTAINERS: Add qemu-progress to the block layer
  qcow2: Discard/zero clusters by byte count
  qcow2: Assert that cluster operations are aligned
  qcow2: Optimize write zero of unaligned tail cluster
  iotests: Add test 179 to cover write zeroes with unmap
  iotests: Improve _filter_qemu_img_map
  qcow2: Optimize zero_single_l2() to minimize L2 churn
  qcow2: Make distinction between zero cluster types obvious
  qcow2: Name typedef for cluster type
  qcow2: Correctly report status of preallocated zero clusters
  block: Update comments on BDRV_BLOCK_* meanings
  qcow2: Use consistent switch indentation
  qcow2: Nicer variable names in qcow2_update_snapshot_refcount()
  tests: Add coverage for recent block geometry fixes
  blkdebug: Add ability to override unmap geometries
  blkdebug: Simplify override logic
  blkdebug: Add pass-through write_zero and discard support
  blkdebug: Refactor error injection
  blkdebug: Sanity check block layer guarantees
  qemu-io: Switch 'map' output to byte-based reporting
  ...
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

d541e201

MAINTAINERS: Add qemu-progress to the block layer · 8dd30c86

由 Max Reitz 提交于 4月 28, 2017

util/qemu-progress.c is currently unmaintained. The only user of its
functionality is qemu-img, so it effectively is part of the block layer.
Suggested-by: NFam Zheng <famz@redhat.com>
Signed-off-by: NMax Reitz <mreitz@redhat.com>
Message-id: 20170428165517.30341-1-mreitz@redhat.com
Reviewed-by: NEric Blake <eblake@redhat.com>
Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: NMax Reitz <mreitz@redhat.com>

8dd30c86

qcow2: Discard/zero clusters by byte count · d2cb36af

由 Eric Blake 提交于 5月 06, 2017

Passing a byte offset, but sector count, when we ultimately
want to operate on cluster granularity, is madness.  Clean up
the external interfaces to take both offset and count as bytes,
while still keeping the assertion added previously that the
caller must align the values to a cluster.  Then rename things
to make sure backports don't get confused by changed units:
instead of qcow2_discard_clusters() and qcow2_zero_clusters(),
we now have qcow2_cluster_discard() and qcow2_cluster_zeroize().

The internal functions still operate on clusters at a time, and
return an int for number of cleared clusters; but on an image
with 2M clusters, a single L2 table holds 256k entries that each
represent a 2M cluster, totalling well over INT_MAX bytes if we
ever had a request for that many bytes at once.  All our callers
currently limit themselves to 32-bit bytes (and therefore fewer
clusters), but by making this function 64-bit clean, we have one
less place to clean up if we later improve the block layer to
support 64-bit bytes through all operations (with the block layer
auto-fragmenting on behalf of more-limited drivers), rather than
the current state where some interfaces are artificially limited
to INT_MAX at a time.
Signed-off-by: NEric Blake <eblake@redhat.com>
Reviewed-by: NMax Reitz <mreitz@redhat.com>
Message-id: 20170507000552.20847-13-eblake@redhat.com
Signed-off-by: NMax Reitz <mreitz@redhat.com>

d2cb36af

qcow2: Assert that cluster operations are aligned · f10ee139

由 Eric Blake 提交于 5月 06, 2017

We already audited (in commit 0c1bd469) that qcow2_discard_clusters()
is only passed cluster-aligned start values; but we can further
tighten the assertion that the only unaligned end value is at EOF.

Recent commits have taken advantage of an unaligned tail cluster,
for both discard and write zeroes.
Signed-off-by: NEric Blake <eblake@redhat.com>
Reviewed-by: NMax Reitz <mreitz@redhat.com>
Message-id: 20170507000552.20847-12-eblake@redhat.com
Signed-off-by: NMax Reitz <mreitz@redhat.com>

f10ee139

qcow2: Optimize write zero of unaligned tail cluster · fbaa6bb3

由 Eric Blake 提交于 5月 06, 2017

We've already improved discards to operate efficiently on the tail
of an unaligned qcow2 image; it's time to make a similar improvement
to write zeroes.  The special case is only valid at the tail
cluster of a file, where we must recognize that any sectors beyond
the image end would implicitly read as zero, and therefore should
not penalize our logic for widening a partial cluster into writing
the whole cluster as zero.

However, note that for now, the special case of end-of-file is only
recognized if there is no backing file, or if the backing file has
the same length; that's because when the backing file is shorter
than the active layer, we don't have code in place to recognize
that reads of a sector unallocated at the top and beyond the backing
end-of-file are implicitly zero.  It's not much of a real loss,
because most people don't use images that aren't cluster-aligned,
or where the active layer is a different size than the backing
layer (especially where the difference falls within a single cluster).

Update test 154 to cover the new scenarios, using two images of
intentionally differing length.

While at it, fix the test to gracefully skip when run as
./check -qcow2 -o compat=0.10 154
since the older format lacks zero clusters already required earlier
in the test.
Signed-off-by: NEric Blake <eblake@redhat.com>
Reviewed-by: NMax Reitz <mreitz@redhat.com>
Message-id: 20170507000552.20847-11-eblake@redhat.com
Signed-off-by: NMax Reitz <mreitz@redhat.com>

fbaa6bb3

iotests: Add test 179 to cover write zeroes with unmap · e249d519

由 Eric Blake 提交于 5月 06, 2017

No tests were covering write zeroes with unmap.  Additionally,
I needed to prove that my previous patches for correct status
reporting and write zeroes optimizations actually had an impact.

The test works for cluster_size between 8k and 2M (for smaller
sizes, it fails because our allocation patterns are not contiguous
with small clusters - in part, the largest consecutive allocation
we tend to get is often bounded by the size covered by one L2
table).

Note that testing for zero clusters is tricky: 'qemu-io map'
reports whether data comes from the current layer of the image
(useful for sniffing out which regions of the file have
QCOW_OFLAG_ZERO) - but doesn't show which clusters have mappings;
while 'qemu-img map' sees "zero":true for both unallocated and
zero clusters for any qcow2 with no backing layer (so less useful
at detecting true zero clusters), but reliably shows mappings.
So we have to rely on both queries side-by-side at each point of
the test.
Signed-off-by: NEric Blake <eblake@redhat.com>
Reviewed-by: NMax Reitz <mreitz@redhat.com>
Message-id: 20170507000552.20847-10-eblake@redhat.com
Signed-off-by: NMax Reitz <mreitz@redhat.com>

e249d519

iotests: Improve _filter_qemu_img_map · d9ca2214

由 Eric Blake 提交于 5月 06, 2017

Although _filter_qemu_img_map documents that it scrubs offsets, it
was only doing so for human mode.  Of the existing tests using the
filter (97, 122, 150, 154, 176), two of them are affected, but it
does not hurt the validity of the tests to not require particular
mappings (another test, 66, uses offsets but intentionally does not
pass through _filter_qemu_img_map, because it checks that offsets
are unchanged before and after an operation).

Another justification for this patch is that it will allow a future
patch to utilize 'qemu-img map --output=json' to check the status of
preallocated zero clusters without regards to the mapping (since
the qcow2 mapping can be very sensitive to the chosen cluster size,
when preallocation is not in use).
Signed-off-by: NEric Blake <eblake@redhat.com>
Reviewed-by: NMax Reitz <mreitz@redhat.com>
Message-id: 20170507000552.20847-9-eblake@redhat.com
Signed-off-by: NMax Reitz <mreitz@redhat.com>

d9ca2214

qcow2: Optimize zero_single_l2() to minimize L2 churn · 06cc5e2b

由 Eric Blake 提交于 5月 06, 2017

Similar to discard_single_l2(), we should try to avoid dirtying
the L2 cache when the cluster we are changing already has the
right characteristics.

Note that by the time we get to zero_single_l2(), BDRV_REQ_MAY_UNMAP
is a requirement to unallocate a cluster (this is because the block
layer clears that flag if discard.* flags during open requested that
we never punch holes - see the conversation around commit 170f4b2e,
https://lists.gnu.org/archive/html/qemu-devel/2016-09/msg07306.html).
Therefore, this patch can only reuse a zero cluster as-is if either
unmapping is not requested, or if the zero cluster was not associated
with an allocation.

Technically, there are some cases where an unallocated cluster
already reads as all zeroes (namely, when there is no backing file
[easy: check bs->backing], or when the backing file also reads as
zeroes [harder: we can't check bdrv_get_block_status since we are
already holding the lock]), where the guest would not immediately see
a difference if we left that cluster unallocated.  But if the user
did not request unmapping, leaving an unallocated cluster is wrong;
and even if the user DID request unmapping, keeping a cluster
unallocated risks a subtle semantic change of guest-visible contents
if a backing file is later added, and it is not worth auditing
whether all internal uses such as mirror properly avoid an unmap
request.  Thus, this patch is intentionally limited to just clusters
that are already marked as zero.
Signed-off-by: NEric Blake <eblake@redhat.com>
Reviewed-by: NMax Reitz <mreitz@redhat.com>
Message-id: 20170507000552.20847-8-eblake@redhat.com
Signed-off-by: NMax Reitz <mreitz@redhat.com>

06cc5e2b

qcow2: Make distinction between zero cluster types obvious · fdfab37d

由 Eric Blake 提交于 5月 06, 2017

Treat plain zero clusters differently from allocated ones, so that
we can simplify the logic of checking whether an offset is present.
Do this by splitting QCOW2_CLUSTER_ZERO into two new enums,
QCOW2_CLUSTER_ZERO_PLAIN and QCOW2_CLUSTER_ZERO_ALLOC.

I tried to arrange the enum so that we could use
'ret <= QCOW2_CLUSTER_ZERO_PLAIN' for all unallocated types, and
'ret >= QCOW2_CLUSTER_ZERO_ALLOC' for allocated types, although
I didn't actually end up taking advantage of the layout.

In many cases, this leads to simpler code, by properly combining
cases (sometimes, both zero types pair together, other times,
plain zero is more like unallocated while allocated zero is more
like normal).
Signed-off-by: NEric Blake <eblake@redhat.com>
Message-id: 20170507000552.20847-7-eblake@redhat.com
Reviewed-by: NMax Reitz <mreitz@redhat.com>
Signed-off-by: NMax Reitz <mreitz@redhat.com>

fdfab37d

qcow2: Name typedef for cluster type · 3ef95218

由 Eric Blake 提交于 5月 06, 2017

Although it doesn't add all that much type safety (this is C, after
all), it does add a bit of legibility to use the name QCow2ClusterType
instead of a plain int.

In particular, qcow2_get_cluster_offset() has an overloaded return
type; a QCow2ClusterType on success, and -errno on failure; keeping
the cluster type in a separate variable makes it slightly easier for
the next patch to make further computations based on the type.
Suggested-by: NMax Reitz <mreitz@redhat.com>
Signed-off-by: NEric Blake <eblake@redhat.com>
Message-id: 20170507000552.20847-6-eblake@redhat.com
[mreitz: Use the new type in two more places (one of them pulled from
         the next patch)]
Signed-off-by: NMax Reitz <mreitz@redhat.com>

3ef95218

qcow2: Correctly report status of preallocated zero clusters · 4341df8a

由 Eric Blake 提交于 5月 06, 2017

We were throwing away the preallocation information associated with
zero clusters.  But we should be matching the well-defined semantics
in bdrv_get_block_status(), where (BDRV_BLOCK_ZERO |
BDRV_BLOCK_OFFSET_VALID) informs the user which offset is reserved,
while still reminding the user that reading from that offset is
likely to read garbage.

count_contiguous_clusters_by_type() is now used only for unallocated
cluster runs, hence it gets renamed and tightened.

Making this change lets us see which portions of an image are zero
but preallocated, when using qemu-img map --output=json.  The
--output=human side intentionally ignores all zero clusters, whether
or not they are preallocated.

The fact that there is no change to qemu-iotests './check -qcow2'
merely means that we aren't yet testing this aspect of qemu-img;
a later patch will add a test.
Signed-off-by: NEric Blake <eblake@redhat.com>
Reviewed-by: NMax Reitz <mreitz@redhat.com>
Message-id: 20170507000552.20847-5-eblake@redhat.com
Signed-off-by: NMax Reitz <mreitz@redhat.com>

4341df8a

block: Update comments on BDRV_BLOCK_* meanings · 4c41cb49

由 Eric Blake 提交于 5月 06, 2017

We had some conflicting documentation: a nice 8-way table that
described all possible combinations of DATA, ZERO, and
OFFSET_VALID, contrasted with text that implied that OFFSET_VALID
always meant raw data could be read directly.  Furthermore, the
text refers a lot to bs->file, even though the interface was
updated back in 67a0fd2a to let the driver pass back a specific
BDS (not necessarily bs->file).  As the 8-way table is the
intended semantics, simplify the rest of the text to get rid of
the confusion.

ALLOCATED is always set by the block layer for convenience (drivers
do not have to worry about it).  RAW is used only internally, but
by more than the raw driver.  Document these additional items on
the driver callback.
Suggested-by: NMax Reitz <mreitz@redhat.com>
Signed-off-by: NEric Blake <eblake@redhat.com>
Reviewed-by: NMax Reitz <mreitz@redhat.com>
Message-id: 20170507000552.20847-4-eblake@redhat.com
Signed-off-by: NMax Reitz <mreitz@redhat.com>

4c41cb49

qcow2: Use consistent switch indentation · bbd995d8

由 Eric Blake 提交于 5月 06, 2017

Fix a couple of inconsistent indentations, before an upcoming
patch further tweaks the switch statements.
(best viewed with 'git diff -b').
Signed-off-by: NEric Blake <eblake@redhat.com>
Reviewed-by: NMax Reitz <mreitz@redhat.com>
Message-id: 20170507000552.20847-3-eblake@redhat.com
Signed-off-by: NMax Reitz <mreitz@redhat.com>

bbd995d8

qcow2: Nicer variable names in qcow2_update_snapshot_refcount() · b32cbae1

由 Eric Blake 提交于 5月 06, 2017

In order to keep checkpatch happy when the next patch changes
indentation, we first have to shorten some long lines.  The easiest
approach is to use a new variable in place of
'offset & L2E_OFFSET_MASK', except that 'offset' is the best name
for that variable.  Change '[old_]offset' to '[old_]entry' to
make room.

While touching things, also fix checkpatch warnings about unusual
'for' statements.

Suggested by Max Reitz <mreitz@redhat.com>
Signed-off-by: NEric Blake <eblake@redhat.com>
Message-id: 20170507000552.20847-2-eblake@redhat.com
Reviewed-by: NMax Reitz <mreitz@redhat.com>
Signed-off-by: NMax Reitz <mreitz@redhat.com>

b32cbae1

tests: Add coverage for recent block geometry fixes · 40812d93

由 Eric Blake 提交于 4月 29, 2017

Use blkdebug's new geometry constraints to emulate setups that
have needed past regression fixes: write zeroes asserting
when running through a loopback block device with max-transfer
smaller than cluster size, and discard rounding away portions
of requests not aligned to preferred boundaries.  Also, add
coverage that the block layer is honoring max transfer limits.

For now, a single iotest performs all actions, with the idea
that we can add future blkdebug constraint test cases in the
same file; but it can be split into multiple iotests if we find
reason to run one portion of the test in more setups than what
are possible in the other.

For reference, the final portion of the test (checking whether
discard passes as much as possible to the lowest layers of the
stack) works as follows:

qemu-io: discard 30M at 80000001, passed to blkdebug
  blkdebug: discard 511 bytes at 80000001, -ENOTSUP (smaller than
blkdebug's 512 align)
  blkdebug: discard 14371328 bytes at 80000512, passed to qcow2
    qcow2: discard 739840 bytes at 80000512, -ENOTSUP (smaller than
qcow2's 1M align)
    qcow2: discard 13M bytes at 77M, succeeds
  blkdebug: discard 15M bytes at 90M, passed to qcow2
    qcow2: discard 15M bytes at 90M, succeeds
  blkdebug: discard 1356800 bytes at 105M, passed to qcow2
    qcow2: discard 1M at 105M, succeeds
    qcow2: discard 308224 bytes at 106M, -ENOTSUP (smaller than qcow2's
1M align)
  blkdebug: discard 1 byte at 111457280, -ENOTSUP (smaller than
blkdebug's 512 align)
Signed-off-by: NEric Blake <eblake@redhat.com>
Reviewed-by: NMax Reitz <mreitz@redhat.com>
Message-id: 20170429191419.30051-10-eblake@redhat.com
[mreitz: For cooperation with image locking, add -r to the qemu-io
         invocation which verifies the image content]
Signed-off-by: NMax Reitz <mreitz@redhat.com>

40812d93

blkdebug: Add ability to override unmap geometries · 430b26a8

由 Eric Blake 提交于 4月 29, 2017

Make it easier to simulate various unusual hardware setups (for
example, recent commits 3482b9bc and b8d0a980 affect the Dell
Equallogic iSCSI with its 15M preferred and maximum unmap and
write zero sizing, or b2f95fee deals with the Linux loopback
block device having a max_transfer of 64k), by allowing blkdebug
to wrap any other device with further restrictions on various
alignments.
Signed-off-by: NEric Blake <eblake@redhat.com>
Reviewed-by: NMax Reitz <mreitz@redhat.com>
Message-id: 20170429191419.30051-9-eblake@redhat.com
Signed-off-by: NMax Reitz <mreitz@redhat.com>

430b26a8

blkdebug: Simplify override logic · 3dc834f8

由 Eric Blake 提交于 4月 29, 2017

Rather than store into a local variable, then copy to the struct
if the value is valid, then reporting errors otherwise, it is
simpler to just store into the struct and report errors if the
value is invalid.  This however requires that the struct store
a 64-bit number, rather than a narrower type.  Likewise, setting
a sane errno value in ret prior to the sequence of parsing and
jumping to out: on error makes it easier for the next patch to
add a chain of similar checks.
Signed-off-by: NEric Blake <eblake@redhat.com>
Message-id: 20170429191419.30051-8-eblake@redhat.com
Reviewed-by: NMax Reitz <mreitz@redhat.com>
Signed-off-by: NMax Reitz <mreitz@redhat.com>

3dc834f8

blkdebug: Add pass-through write_zero and discard support · 63188c24

由 Eric Blake 提交于 4月 29, 2017

In order to test the effects of artificial geometry constraints
on operations like write zero or discard, we first need blkdebug
to manage these actions.  It also allows us to inject errors on
those operations, just like we can for read/write/flush.

We can also test the contract promised by the block layer; namely,
if a device has specified limits on alignment or maximum size,
then those limits must be obeyed (for now, the blkdebug driver
merely inherits limits from whatever it is wrapping, but the next
patch will further enhance it to allow specific limit overrides).

This patch intentionally refuses to service requests smaller than
the requested alignments; this is because an upcoming patch adds
a qemu-iotest to prove that the block layer is correctly handling
fragmentation, but the test only works if there is a way to tell
the difference at artificial alignment boundaries when blkdebug is
using a larger-than-default alignment.  If we let the blkdebug
layer always defer to the underlying layer, which potentially has
a smaller granularity, the iotest will be thwarted.

Tested by setting up an NBD server with export 'foo', then invoking:
$ ./qemu-io
qemu-io> open -o driver=blkdebug blkdebug::nbd://localhost:10809/foo
qemu-io> d 0 15M
qemu-io> w -z 0 15M

Pre-patch, the server never sees the discard (it was silently
eaten by the block layer); post-patch it is passed across the
wire.  Likewise, pre-patch the write is always passed with
NBD_WRITE (with 15M of zeroes on the wire), while post-patch
it can utilize NBD_WRITE_ZEROES (for less traffic).
Signed-off-by: NEric Blake <eblake@redhat.com>
Reviewed-by: NMax Reitz <mreitz@redhat.com>
Message-id: 20170429191419.30051-7-eblake@redhat.com
Signed-off-by: NMax Reitz <mreitz@redhat.com>

63188c24

blkdebug: Refactor error injection · d157ed5f

由 Eric Blake 提交于 4月 29, 2017

Rather than repeat the logic at each caller of checking if a Rule
exists that warrants an error injection, fold that logic into
inject_error(); and rename it to rule_check() for legibility.
This will help the next patch, which adds two more callers that
need to check rules for the potential of injecting errors.
Signed-off-by: NEric Blake <eblake@redhat.com>
Reviewed-by: NMax Reitz <mreitz@redhat.com>
Message-id: 20170429191419.30051-6-eblake@redhat.com
Signed-off-by: NMax Reitz <mreitz@redhat.com>

d157ed5f

blkdebug: Sanity check block layer guarantees · e0ef4395

由 Eric Blake 提交于 4月 29, 2017

Commits 04ed95f4 and 1a62d0ac updated the block layer to auto-fragment
any I/O to fit within device boundaries. Additionally, when using a
minimum alignment of 4k, we want to ensure the block layer does proper
read-modify-write rather than requesting I/O on a slice of a sector.
Let's enforce that the contract is obeyed when using blkdebug.  For
now, blkdebug only allows alignment overrides, and just inherits other
limits from whatever device it is wrapping, but a future patch will
further enhance things.
Signed-off-by: NEric Blake <eblake@redhat.com>
Reviewed-by: NKevin Wolf <kwolf@redhat.com>
Reviewed-by: NMax Reitz <mreitz@redhat.com>
Message-id: 20170429191419.30051-5-eblake@redhat.com
Signed-off-by: NMax Reitz <mreitz@redhat.com>

e0ef4395

qemu-io: Switch 'map' output to byte-based reporting · 6f3c90af

由 Eric Blake 提交于 4月 29, 2017

Mixing byte offset and sector allocation counts is a bit
confusing.  Also, reporting n/m sectors, where m decreases
according to the remaining size of the file, isn't really
adding any useful information; and reporting an offset at
both the front and end of the line, with large amounts of
whitespace, is pointless.  Update the output to use byte
counts and shorter lines, then adjust the affected tests
(./check -qcow2 102, ./check -vpc 146).

Note that 'qemu-io map' is MUCH weaker than 'qemu-img map';
the former only shows which regions of the active layer are
allocated, without regards to where the allocation comes from
or whether the allocated portion is known to read as zero
(because it is using the weaker bdrv_is_allocated()); while the
latter (especially in --output=json mode) reports more details
from bdrv_get_block_status().
Signed-off-by: NEric Blake <eblake@redhat.com>
Message-id: 20170429191419.30051-4-eblake@redhat.com
Reviewed-by: NMax Reitz <mreitz@redhat.com>
Signed-off-by: NMax Reitz <mreitz@redhat.com>

6f3c90af

qemu-io: Switch 'alloc' command to byte-based length · 4401fdc7

由 Eric Blake 提交于 4月 29, 2017

For the 'alloc' command, accepting an offset in bytes but a length
in sectors, and reporting output in sectors, is confusing.  Do
everything in bytes, and adjust the expected output accordingly.
Signed-off-by: NEric Blake <eblake@redhat.com>
Message-id: 20170429191419.30051-3-eblake@redhat.com
Reviewed-by: NMax Reitz <mreitz@redhat.com>
Signed-off-by: NMax Reitz <mreitz@redhat.com>

4401fdc7

qemu-io: Improve alignment checks · 1bce6b4c

由 Eric Blake 提交于 4月 29, 2017

Several copy-and-pasted alignment checks exist in qemu-io, which
could use some minor improvements:

- Manual comparison against 0x1ff is not as clean as using our
alignment macros (QEMU_IS_ALIGNED) from osdep.h.

- The error messages aren't quite grammatically correct.
Suggested-by: NPhilippe Mathieu-Daudé <f4bug@amsat.org>
Suggested-by: NMax Reitz <mreitz@redhat.com>
Signed-off-by: NEric Blake <eblake@redhat.com>
Message-id: 20170429191419.30051-2-eblake@redhat.com
Reviewed-by: NPhilippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: NMax Reitz <mreitz@redhat.com>
Signed-off-by: NMax Reitz <mreitz@redhat.com>

1bce6b4c

blockdev: use drained_begin/end for qmp_block_resize · 698bdfa0

由 John Snow 提交于 5月 10, 2017

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1447551

If one tries to issue a block_resize while a guest is busy
accessing the disk, it is possible that qemu may deadlock
when invoking aio_poll from both the main loop and the iothread.

Replace another instance of bdrv_drain_all that doesn't
quite belong.

Cc: qemu-stable@nongnu.org
Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NJohn Snow <jsnow@redhat.com>
Reviewed-by: NEric Blake <eblake@redhat.com>
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

698bdfa0

nvme: Implement Write Zeroes · c03e7ef1

由 Christoph Hellwig 提交于 5月 05, 2017

Signed-off-by: NKeith Busch <keith.busch@intel.com>
[hch: ported over from qemu-nvme.git to mainline]
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

c03e7ef1

qemu-img: wait for convert coroutines to complete · b91127ed

由 Anton Nefedov 提交于 4月 26, 2017

On error path (like i/o error in one of the coroutines), it's required to
  - wait for coroutines completion before cleaning the common structures
  - reenter dependent coroutines so they ever finish

Introduced in 2d9187bc.

Cc: qemu-stable@nongnu.org
Signed-off-by: NAnton Nefedov <anton.nefedov@virtuozzo.com>
Reviewed-by: NPeter Lieven <pl@kamp.de>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

b91127ed

file-posix: Remove .bdrv_inactivate/invalidate_cache · 22d5cd82

由 Kevin Wolf 提交于 5月 04, 2017

Now that the block layer takes care to request a lot less permissions
for inactive nodes, the special-casing in file-posix isn't necessary any
more.
Signed-off-by: NKevin Wolf <kwolf@redhat.com>
Reviewed-by: NEric Blake <eblake@redhat.com>

22d5cd82

block: Fix write/resize permissions for inactive images · 9c5e6594

由 Kevin Wolf 提交于 5月 04, 2017

Format drivers for inactive nodes don't need write/resize permissions on
their bs->file and can share write/resize with another VM (in fact, this
is the whole point of keeping images inactive). Represent this fact in
the op blocker system, so that image locking does the right thing
without special-casing inactive images.
Signed-off-by: NKevin Wolf <kwolf@redhat.com>
Reviewed-by: NEric Blake <eblake@redhat.com>

9c5e6594

block: Inactivate parents before children · 38701b6a

由 Kevin Wolf 提交于 5月 04, 2017

The proper order for inactivating block nodes is that first the parents
get inactivated and then the children. If we do things in this order, we
can assert that we didn't accidentally leave a parent activated when one
of its child nodes is inactive.
Signed-off-by: NKevin Wolf <kwolf@redhat.com>
Reviewed-by: NEric Blake <eblake@redhat.com>

38701b6a

block: Drop permissions when migration completes · cfa1a572

由 Kevin Wolf 提交于 5月 04, 2017

With image locking, permissions affect other qemu processes as well. We
want to be sure that the destination can run, so let's drop permissions
on the source when migration completes.
Signed-off-by: NKevin Wolf <kwolf@redhat.com>
Reviewed-by: NEric Blake <eblake@redhat.com>

cfa1a572

block: New BdrvChildRole.activate() for blk_resume_after_migration() · 4417ab7a

由 Kevin Wolf 提交于 5月 04, 2017

Instead of manually calling blk_resume_after_migration() in migration
code after doing bdrv_invalidate_cache_all(), integrate the BlockBackend
activation with cache invalidation into a single function. This is
achieved with a new callback in BdrvChildRole that is called by
bdrv_invalidate_cache_all().
Signed-off-by: NKevin Wolf <kwolf@redhat.com>
Reviewed-by: NEric Blake <eblake@redhat.com>

4417ab7a

migration: Unify block node activation error handling · ace21a58

由 Kevin Wolf 提交于 5月 04, 2017

Migration code activates all block driver nodes on the destination when
the migration completes. It does so by calling
bdrv_invalidate_cache_all() and blk_resume_after_migration(). There is
one code path for precopy and one for postcopy migration, resulting in
four function calls, which used to have three different failure modes.

This patch unifies the behaviour so that failure to activate all block
nodes is non-fatal, but the error message is logged and the VM isn't
automatically started. 'cont' will retry activating the block nodes.
Signed-off-by: NKevin Wolf <kwolf@redhat.com>
Reviewed-by: NEric Blake <eblake@redhat.com>

ace21a58

iotests: Extend test 066 · aa93c834

由 Max Reitz 提交于 5月 04, 2017

066 was supposed to be a test "for discarding preallocated zero
clusters", but it did so incompletely: While it did check the image
file's integrity after the operation, it did not confirm that the
clusters are indeed freed. This patch adds this test.

In addition, new cases for writing to preallocated zero clusters are
added.
Signed-off-by: NMax Reitz <mreitz@redhat.com>
Reviewed-by: NEric Blake <eblake@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

aa93c834

qcow2: Discard preallocated zero clusters · 293073a5

由 Max Reitz 提交于 5月 04, 2017

In discard_single_l2(), we completely discard normal clusters instead of
simply turning them into preallocated zero clusters. That means we
should probably do the same with such preallocated zero clusters:
Discard them instead of keeping them allocated.
Reported-by: NEric Blake <eblake@redhat.com>
Signed-off-by: NMax Reitz <mreitz@redhat.com>
Reviewed-by: NEric Blake <eblake@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

293073a5