提交 · 5cb2737e925042e6c7cd3fb0b01313950b03cddf · openeuler / qemu

04 6月, 2019 2 次提交

block/io: Delay decrementing the quiesce_counter · 5cb2737e

由 Max Reitz 提交于 5月 22, 2019

When ending a drained section, bdrv_do_drained_end() currently first
decrements the quiesce_counter, and only then actually ends the drain.

The bdrv_drain_invoke(bs, false) call may cause graph changes.  Say the
graph change involves replacing an existing BB's ("blk") BDS
(blk_bs(blk)) by @bs.  Let us introducing the following values:
- bs_oqc = old_quiesce_counter
  (so bs->quiesce_counter == bs_oqc - 1)
- obs_qc = blk_bs(blk)->quiesce_counter (before bdrv_drain_invoke())

Let us assume there is no blk_pread_unthrottled() involved, so
blk->quiesce_counter == obs_qc (before bdrv_drain_invoke()).

Now replacing blk_bs(blk) by @bs will reduce blk->quiesce_counter by
obs_qc (making it 0) and increase it by bs_oqc-1 (making it bs_oqc-1).

bdrv_drain_invoke() returns and we invoke bdrv_parent_drained_end().
This will decrement blk->quiesce_counter by one, so it would be -1 --
were there not an assertion against that in blk_root_drained_end().

We therefore have to keep the quiesce_counter up at least until
bdrv_drain_invoke() returns, so that bdrv_parent_drained_end() does the
right thing for the parents @bs got during bdrv_drain_invoke().

But let us delay it even further, namely until bdrv_parent_drained_end()
returns, because then it mirrors bdrv_do_drained_begin(): There, we
first increment the quiesce_counter, then begin draining the parents,
and then call bdrv_drain_invoke().  It makes sense to let
bdrv_do_drained_end() unravel this exactly in reverse.
Signed-off-by: NMax Reitz <mreitz@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

5cb2737e

block: avoid recursive block_status call if possible · 69f47505

由 Vladimir Sementsov-Ogievskiy 提交于 4月 08, 2019

drv_co_block_status digs bs->file for additional, more accurate search
for hole inside region, reported as DATA by bs since 5daa74a6.

This accuracy is not free: assume we have qcow2 disk. Actually, qcow2
knows, where are holes and where is data. But every block_status
request calls lseek additionally. Assume a big disk, full of
data, in any iterative copying block job (or img convert) we'll call
lseek(HOLE) on every iteration, and each of these lseeks will have to
iterate through all metadata up to the end of file. It's obviously
ineffective behavior. And for many scenarios we don't need this lseek
at all.

However, lseek is needed when we have metadata-preallocated image.

So, let's detect metadata-preallocation case and don't dig qcow2's
protocol file in other cases.

The idea is to compare allocation size in POV of filesystem with
allocations size in POV of Qcow2 (by refcounts). If allocation in fs is
significantly lower, consider it as metadata-preallocation case.

102 iotest changed, as our detector can't detect shrinked file as
metadata-preallocation, which don't seem to be wrong, as with metadata
preallocation we always have valid file length.

Two other iotests have a slight change in their QMP output sequence:
Active 'block-commit' returns earlier because the job coroutine yields
earlier on a blocking operation. This operation is loading the refcount
blocks in qcow2_detect_metadata_preallocation().
Suggested-by: NDenis V. Lunev <den@openvz.org>
Signed-off-by: NVladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

69f47505

20 5月, 2019 1 次提交

block: Use BDRV_REQUEST_MAX_BYTES instead of BDRV_REQUEST_MAX_SECTORS · 41ae31e3

由 Alberto Garcia 提交于 5月 14, 2019

There are a few places in which we turn a number of bytes into sectors
in order to compare the result against BDRV_REQUEST_MAX_SECTORS
instead of using BDRV_REQUEST_MAX_BYTES directly.
Signed-off-by: NAlberto Garcia <berto@igalia.com>
Reviewed-by: NStefano Garzarella <sgarzare@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

41ae31e3

10 5月, 2019 2 次提交

block: Remove bdrv_read() and bdrv_write() · 2e11d756

由 Alberto Garcia 提交于 5月 01, 2019

No one is using these functions anymore, all callers have switched to
the byte-based bdrv_pread() and bdrv_pwrite()
Signed-off-by: NAlberto Garcia <berto@igalia.com>
Reviewed-by: NVladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

2e11d756

block/io.c: fix for the allocation failure · 118f9944

由 Andrey Shinkevich 提交于 4月 05, 2019

On a file system used by the customer, fallocate() returns an error
if the block is not properly aligned. So, bdrv_co_pwrite_zeroes()
fails. We can handle that case the same way as it is done for the
unsupported cases, namely, call to bdrv_driver_pwritev() that writes
zeroes to an image for the unaligned chunk of the block.
Suggested-by: NDenis V. Lunev <den@openvz.org>
Signed-off-by: NAndrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: NJohn Snow <jsnow@redhat.com>
Message-id: 1554474244-553661-1-git-send-email-andrey.shinkevich@virtuozzo.com
Message-Id: <1554474244-553661-1-git-send-email-andrey.shinkevich@virtuozzo.com>
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>

118f9944

26 3月, 2019 2 次提交

block: Add BDRV_REQ_NO_FALLBACK · fe0480d6

由 Kevin Wolf 提交于 3月 22, 2019

For qemu-img convert, we want an operation that zeroes out the whole
image if this can be done efficiently, but that returns an error
otherwise so we don't write explicit zeroes and immediately overwrite
them with the real data, potentially doubling the amount of data to be
written.
Signed-off-by: NKevin Wolf <kwolf@redhat.com>
Acked-by: NEric Blake <eblake@redhat.com>

fe0480d6

block: Remove error messages in bdrv_make_zero() · 48ce9860

由 Kevin Wolf 提交于 3月 22, 2019

There is only a single caller of bdrv_make_zero(), which is qemu-img
convert. If the function fails, we just fall back to a different method
of zeroing out blocks on the target image. There is no good reason to
print error messages on stderr when the higher level operation will
actually succeed.
Signed-off-by: NKevin Wolf <kwolf@redhat.com>
Acked-by: NEric Blake <eblake@redhat.com>

48ce9860

22 2月, 2019 1 次提交

block/io: use qemu_iovec_init_buf · 0d93ed08

由 Vladimir Sementsov-Ogievskiy 提交于 2月 18, 2019

Use new qemu_iovec_init_buf() instead of
qemu_iovec_init_external( ... , 1), which simplifies the code.

While being here, use qemu_try_blockalign0 as well.
Signed-off-by: NVladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: NEric Blake <eblake@redhat.com>
Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
Message-id: 20190218140926.333779-3-vsementsov@virtuozzo.com
Message-Id: <20190218140926.333779-3-vsementsov@virtuozzo.com>
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>

0d93ed08

01 2月, 2019 1 次提交

block: Fix hangs in synchronous APIs with iothreads · 4720cbee

由 Kevin Wolf 提交于 1月 07, 2019

In the block layer, synchronous APIs are often implemented by creating a
coroutine that calls the asynchronous coroutine-based implementation and
then waiting for completion with BDRV_POLL_WHILE().

For this to work with iothreads (more specifically, when the synchronous
API is called in a thread that is not the home thread of the block
device, so that the coroutine will run in a different thread), we must
make sure to call aio_wait_kick() at the end of the operation. Many
places are missing this, so that BDRV_POLL_WHILE() keeps hanging even if
the condition has long become false.

Note that bdrv_dec_in_flight() involves an aio_wait_kick() call. This
corresponds to the BDRV_POLL_WHILE() in the drain functions, but it is
generally not enough for most other operations because they haven't set
the return value in the coroutine entry stub yet. To avoid race
conditions there, we need to kick after setting the return value.

The race window is small enough that the problem doesn't usually surface
in the common path. However, it does surface and causes easily
reproducible hangs if the operation can return early before even calling
bdrv_inc/dec_in_flight, which many of them do (trivial error or no-op
success paths).

The bug in bdrv_truncate(), bdrv_check() and bdrv_invalidate_cache() is
slightly different: These functions even neglected to schedule the
coroutine in the home thread of the node. This avoids the hang, but is
obviously wrong, too. Fix those to schedule the coroutine in the right
AioContext in addition to adding aio_wait_kick() calls.

Cc: qemu-stable@nongnu.org
Signed-off-by: NKevin Wolf <kwolf@redhat.com>
Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>

4720cbee

25 9月, 2018 3 次提交

block: Use a single global AioWait · cfe29d82

由 Kevin Wolf 提交于 9月 18, 2018

When draining a block node, we recurse to its parent and for subtree
drains also to its children. A single AIO_WAIT_WHILE() is then used to
wait for bdrv_drain_poll() to become true, which depends on all of the
nodes we recursed to. However, if the respective child or parent becomes
quiescent and calls bdrv_wakeup(), only the AioWait of the child/parent
is checked, while AIO_WAIT_WHILE() depends on the AioWait of the
original node.

Fix this by using a single AioWait for all callers of AIO_WAIT_WHILE().

This may mean that the draining thread gets a few more unnecessary
wakeups because an unrelated operation got completed, but we already
wake it up when something _could_ have changed rather than only if it
has certainly changed.

Apart from that, drain is a slow path anyway. In theory it would be
possible to use wakeups more selectively and still correctly, but the
gains are likely not worth the additional complexity. In fact, this
patch is a nice simplification for some places in the code.
Signed-off-by: NKevin Wolf <kwolf@redhat.com>
Reviewed-by: NEric Blake <eblake@redhat.com>
Reviewed-by: NMax Reitz <mreitz@redhat.com>

cfe29d82

block: Remove aio_poll() in bdrv_drain_poll variants · 4cf077b5

由 Kevin Wolf 提交于 8月 17, 2018

bdrv_drain_poll_top_level() was buggy because it didn't release the
AioContext lock of the node to be drained before calling aio_poll().
This way, callbacks called by aio_poll() would possibly take the lock a
second time and run into a deadlock with a nested AIO_WAIT_WHILE() call.

However, it turns out that the aio_poll() call isn't actually needed any
more. It was introduced in commit 91af091f, which is effectively
reverted by this patch. The cases it was supposed to fix are now covered
by bdrv_drain_poll(), which waits for block jobs to reach a quiescent
state.
Signed-off-by: NKevin Wolf <kwolf@redhat.com>
Reviewed-by: NFam Zheng <famz@redhat.com>
Reviewed-by: NMax Reitz <mreitz@redhat.com>

4cf077b5

block: Add missing locking in bdrv_co_drain_bh_cb() · aa1361d5

由 Kevin Wolf 提交于 8月 17, 2018

bdrv_do_drained_begin/end() assume that they are called with the
AioContext lock of bs held. If we call drain functions from a coroutine
with the AioContext lock held, we yield and schedule a BH to move out of
coroutine context. This means that the lock for the home context of the
coroutine is released and must be re-acquired in the bottom half.
Signed-off-by: NKevin Wolf <kwolf@redhat.com>
Reviewed-by: NMax Reitz <mreitz@redhat.com>

aa1361d5

10 7月, 2018 14 次提交

block: Use common write req handling in truncate · cd47d792

由 Fam Zheng 提交于 7月 10, 2018

Truncation is the last to convert from open coded req handling to
reusing helpers. This time the permission check in prepare has to adapt
to the new caller: it checks a different permission bit, and doesn't
trigger the before write notifier.

Also, truncation should always trigger a bs->total_sectors update and in
turn call parent resize_cb. Update the condition in finish helper, too.

It's intended to do a duplicated bs->read_only check before calling
bdrv_co_write_req_prepare() so that we can be more informative with the
error message, as bdrv_co_write_req_prepare() doesn't have Error
parameter.
Signed-off-by: NFam Zheng <famz@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

cd47d792

block: Fix bdrv_co_truncate overlap check · 5416a11e

由 Fam Zheng 提交于 7月 10, 2018

If we are growing the image and potentially using preallocation for the
new area, we need to make sure that no write requests are made to the
"preallocated" area which is [@old_size, @offset), not
[@offset, offset * 2 - @old_size).
Signed-off-by: NFam Zheng <famz@redhat.com>
Reviewed-by: NEric Blake <eblake@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

5416a11e

block: Use common req handling in copy offloading · 0eb1e891

由 Fam Zheng 提交于 7月 10, 2018

This brings the request handling logic inline with write and discard,
fixing write_gen, resize_cb, dirty bitmaps and image size refreshing.
The last of these issues broke iotest case 222, which is now fixed.
Signed-off-by: NFam Zheng <famz@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

0eb1e891

block: Use common req handling for discard · 00695c27

由 Fam Zheng 提交于 7月 10, 2018

Reuse the new bdrv_co_write_req_prepare/finish helpers. The variation
here is that discard requests don't affect bs->wr_highest_offset, and it
cannot extend the image.
Signed-off-by: NFam Zheng <famz@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

00695c27

block: Fix handling of image enlarging write · 7f8f03ef

由 Fam Zheng 提交于 7月 10, 2018

Two problems exist when a write request that enlarges the image (i.e.
write beyond EOF) finishes:

1) parent is not notified about size change;
2) dirty bitmap is not resized although we try to set the dirty bits;

Fix them just like how bdrv_co_truncate works.
Reported-by: NKevin Wolf <kwolf@redhat.com>
Signed-off-by: NFam Zheng <famz@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

7f8f03ef

block: Extract common write req handling · 85fe2479

由 Fam Zheng 提交于 7月 10, 2018

As a mechanical refactoring patch, this is the first step towards
unified and more correct write code paths. This is helpful because
multiple BlockDriverState fields need to be updated after modifying
image data, and it's hard to maintain in multiple places such as copy
offload, discard and truncate.
Suggested-by: NKevin Wolf <kwolf@redhat.com>
Signed-off-by: NFam Zheng <famz@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

85fe2479

block: Use uint64_t for BdrvTrackedRequest byte fields · 22931a15

由 Fam Zheng 提交于 7月 10, 2018

This matches the types used for bytes in the rest parts of block layer.
In the case of bdrv_co_truncate, new_bytes can be the image size which
probably doesn't fit in a 32 bit int.
Signed-off-by: NFam Zheng <famz@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

22931a15

block: Use BdrvChild to discard · 0b9fd3f4

由 Fam Zheng 提交于 7月 10, 2018

Other I/O functions are already using a BdrvChild pointer in the API, so
make discard do the same. It makes it possible to initiate the same
permission checks before doing I/O, and much easier to share the
helper functions for this, which will be added and used by write,
truncate and copy range paths.
Signed-off-by: NFam Zheng <famz@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

0b9fd3f4

block: Add copy offloading trace points · ecc983a5

由 Fam Zheng 提交于 7月 10, 2018

A few trace points that can help reveal what is happening in a copy
offloading I/O path.
Signed-off-by: NFam Zheng <famz@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

ecc983a5

block: add BDRV_REQ_SERIALISING flag · 09d2f948

由 Vladimir Sementsov-Ogievskiy 提交于 7月 09, 2018

Serialized writes should be used in copy-on-write of backup(sync=none)
for image fleecing scheme.

We need to change an assert in bdrv_aligned_pwritev, added in
28de2dcd. The assert may fail now, because call to
wait_serialising_requests here may become first call to it for this
request with serializing flag set. It occurs if the request is aligned
(otherwise, we should already set serializing flag before calling
bdrv_aligned_pwritev and correspondingly waited for all intersecting
requests). However, for aligned requests, we should not care about
outdating of previously read data, as there no such data. Therefore,
let's just update an assert to not care about aligned requests.
Signed-off-by: NVladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: NFam Zheng <famz@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

09d2f948

block: split flags in copy_range · 67b51fb9

由 Vladimir Sementsov-Ogievskiy 提交于 7月 09, 2018

Pass read flags and write flags separately. This is needed to handle
coming BDRV_REQ_NO_SERIALISING clearly in following patches.
Signed-off-by: NVladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: NFam Zheng <famz@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

67b51fb9

block/io: fix copy_range · 999658a0

由 Vladimir Sementsov-Ogievskiy 提交于 7月 09, 2018

Here two things are fixed:

1. Architecture

On each recursion step, we go to the child of src or dst, only for one
of them. So, it's wrong to create tracked requests for both on each
step. It leads to tracked requests duplication.

2. Wait for serializing requests on write path independently of
   BDRV_REQ_NO_SERIALISING

Before commit 9ded4a01 "backup: Use copy offloading",
BDRV_REQ_NO_SERIALISING was used for only one case: read in
copy-on-write operation during backup. Also, the flag was handled only
on read path (in bdrv_co_preadv and bdrv_aligned_preadv).

After 9ded4a01, flag is used for not waiting serializing operations
on backup target (in same case of copy-on-write operation). This
behavior change is unsubstantiated and potentially dangerous, let's
drop it and add additional asserts and documentation.
Signed-off-by: NVladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: NFam Zheng <famz@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

999658a0

block: Fix copy-on-read crash with partial final cluster · b0ddcbbb

由 Kevin Wolf 提交于 7月 06, 2018

If the virtual disk size isn't aligned to full clusters,
bdrv_co_do_copy_on_readv() may get pnum == 0 before having the full
cluster completed, which will let it run into an assertion failure:

qemu-io: block/io.c:1203: bdrv_co_do_copy_on_readv: Assertion `skip_bytes < pnum' failed.

Check for EOF, assert that we read at least as much as the read request
originally wanted to have (which is true at EOF because otherwise
bdrv_check_byte_request() would already have returned an error) and
return success early even though we couldn't copy the full cluster.
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

b0ddcbbb

block: Poll after drain on attaching a node · 4be6a6d1

由 Kevin Wolf 提交于 6月 29, 2018

Commit dcf94a23 ('block: Don't poll in parent drain callbacks')
removed polling in bdrv_child_cb_drained_begin() on the grounds that the
original bdrv_drain() already will poll and BdrvChildRole.drained_begin
calls must not cause graph changes (and therefore must not call
aio_poll() or the recursion through the graph will break.

This reasoning is correct for calls through bdrv_do_drained_begin().
However, BdrvChildRole.drained_begin is also called when a node that is
already in a drained section (i.e. bdrv_do_drained_begin() has already
returned and therefore can't poll any more) is attached to a new parent.
In this case, we must explicitly poll to have all requests completed
before the drained new child can be attached to the parent.

In bdrv_replace_child_noperm(), we know that we're not inside the
recursion of bdrv_do_drained_begin() because graph changes are not
allowed there, and bdrv_replace_child_noperm() is a graph change. The
call of BdrvChildRole.drained_begin() must therefore be followed by a
BDRV_POLL_WHILE() that waits for the completion of requests.
Reported-by: NMax Reitz <mreitz@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

4be6a6d1

03 7月, 2018 2 次提交

block: Honour BDRV_REQ_NO_SERIALISING in copy range · dee12de8

由 Fam Zheng 提交于 7月 03, 2018

This semantics is needed by drive-backup so implement it before using
this API there.
Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: NFam Zheng <famz@redhat.com>
Message-id: 20180703023758.14422-3-famz@redhat.com
Signed-off-by: NJeff Cody <jcody@redhat.com>

dee12de8

block: Fix parameter checking in bdrv_co_copy_range_internal · d4d3e5a0

由 Fam Zheng 提交于 7月 03, 2018

src may be NULL if BDRV_REQ_ZERO_WRITE flag is set, in this case only
check dst and dst->bs. This bug was introduced when moving in the
request tracking code from bdrv_co_copy_range, in 37aec7d7.

This especially fixes the possible segfault when initializing src_bs
with a NULL src.
Signed-off-by: NFam Zheng <famz@redhat.com>
Message-id: 20180703023758.14422-2-famz@redhat.com
Reviewed-by: NJeff Cody <jcody@redhat.com>
Signed-off-by: NJeff Cody <jcody@redhat.com>

d4d3e5a0

29 6月, 2018 4 次提交

block: Remove unused sector-based vectored I/O · 583c99d3

由 Eric Blake 提交于 6月 28, 2018

We are gradually moving away from sector-based interfaces, towards
byte-based.  Now that all callers of vectored I/O have been converted
to use our preferred byte-based bdrv_co_p{read,write}v(), we can
delete the unused bdrv_co_{read,write}v().

Furthermore, this gets rid of the signature difference between the
public bdrv_co_writev() and the callback .bdrv_co_writev (the
latter still exists, because some drivers still need more work
before they are fully byte-based).
Signed-off-by: NEric Blake <eblake@redhat.com>
Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: NJeff Cody <jcody@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

583c99d3

block: Move request tracking to children in copy offloading · 37aec7d7

由 Fam Zheng 提交于 6月 27, 2018

in_flight and tracked requests need to be tracked in every layer during
recursion. For now the only user is qemu-img convert where overlapping
requests and IOThreads don't exist, therefore this change doesn't make
much difference form user point of view, but it is incorrect as part of
the API. Fix it.
Reported-by: NKevin Wolf <kwolf@redhat.com>
Signed-off-by: NFam Zheng <famz@redhat.com>
Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

37aec7d7

block: Use tracked request for truncate · 1bc5f09f

由 Kevin Wolf 提交于 6月 26, 2018

When growing an image, block drivers (especially protocol drivers) may
initialise the newly added area. I/O requests to the same area need to
wait for this initialisation to be completed so that data writes don't
get overwritten and reads don't read uninitialised data.

To avoid overhead in the fast I/O path by adding new locking in the
protocol drivers and to restrict the impact to requests that actually
touch the new area, reuse the existing tracked request infrastructure in
block/io.c and mark all discard requests as serialising.

With this change, it is safe for protocol drivers to make
.bdrv_co_truncate actually asynchronous.
Signed-off-by: NKevin Wolf <kwolf@redhat.com>
Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>

1bc5f09f

block: Move bdrv_truncate() implementation to io.c · 3d9f2d2a

由 Kevin Wolf 提交于 6月 26, 2018

This moves the bdrv_truncate() implementation from block.c to block/io.c
so it can have access to the tracked requests infrastructure.

This involves making refresh_total_sectors() public (in block_int.h).
Signed-off-by: NKevin Wolf <kwolf@redhat.com>
Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>

3d9f2d2a

18 6月, 2018 8 次提交

block: Allow graph changes in bdrv_drain_all_begin/end sections · 0f12264e

由 Kevin Wolf 提交于 3月 28, 2018

bdrv_drain_all_*() used bdrv_next() to iterate over all root nodes and
did a subtree drain for each of them. This works fine as long as the
graph is static, but sadly, reality looks different.

If the graph changes so that root nodes are added or removed, we would
have to compensate for this. bdrv_next() returns each root node only
once even if it's the root node for multiple BlockBackends or for a
monitor-owned block driver tree, which would only complicate things.

The much easier and more obviously correct way is to fundamentally
change the way the functions work: Iterate over all BlockDriverStates,
no matter who owns them, and drain them individually. Compensation is
only necessary when a new BDS is created inside a drain_all section.
Removal of a BDS doesn't require any action because it's gone afterwards
anyway.
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

0f12264e

block: ignore_bds_parents parameter for drain functions · 6cd5c9d7

由 Kevin Wolf 提交于 5月 29, 2018

In the future, bdrv_drained_all_begin/end() will drain all invidiual
nodes separately rather than whole subtrees. This means that we don't
want to propagate the drain to all parents any more: If the parent is a
BDS, it will already be drained separately. Recursing to all parents is
unnecessary work and would make it an O(n²) operation.

Prepare the drain function for the changed drain_all by adding an
ignore_bds_parents parameter to the internal implementation that
prevents the propagation of the drain to BDS parents. We still (have to)
propagate it to non-BDS parents like BlockBackends or Jobs because those
are not drained separately.
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

6cd5c9d7

block: Move bdrv_drain_all_begin() out of coroutine context · c8ca33d0

由 Kevin Wolf 提交于 4月 10, 2018

Before we can introduce a single polling loop for all nodes in
bdrv_drain_all_begin(), we must make sure to run it outside of coroutine
context like we already do for bdrv_do_drained_begin().
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

c8ca33d0

block: Defer .bdrv_drain_begin callback to polling phase · 0109e7e6

由 Kevin Wolf 提交于 3月 23, 2018

We cannot allow aio_poll() in bdrv_drain_invoke(begin=true) until we're
done with propagating the drain through the graph and are doing the
single final BDRV_POLL_WHILE().

Just schedule the coroutine with the callback and increase bs->in_flight
to make sure that the polling phase will wait for it.
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

0109e7e6

block: Don't poll in parent drain callbacks · dcf94a23

由 Kevin Wolf 提交于 3月 23, 2018

bdrv_do_drained_begin() is only safe if we have a single
BDRV_POLL_WHILE() after quiescing all affected nodes. We cannot allow
that parent callbacks introduce a nested polling loop that could cause
graph changes while we're traversing the graph.

Split off bdrv_do_drained_begin_quiesce(), which only quiesces a single
node without waiting for its requests to complete. These requests will
be waited for in the BDRV_POLL_WHILE() call down the call chain.
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

dcf94a23

block: Drain recursively with a single BDRV_POLL_WHILE() · fe4f0614

由 Kevin Wolf 提交于 3月 23, 2018

Anything can happen inside BDRV_POLL_WHILE(), including graph
changes that may interfere with its callers (e.g. child list iteration
in recursive callers of bdrv_do_drained_begin).

Switch to a single BDRV_POLL_WHILE() call for the whole subtree at the
end of bdrv_do_drained_begin() to avoid such effects. The recursion
happens now inside the loop condition. As the graph can only change
between bdrv_drain_poll() calls, but not inside of it, doing the
recursion here is safe.
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

fe4f0614

block: Remove bdrv_drain_recurse() · d30b8e64

由 Kevin Wolf 提交于 3月 22, 2018

For bdrv_drain(), recursively waiting for child node requests is
pointless because we didn't quiesce their parents, so new requests could
come in anyway. Letting the function work only on a single node makes it
more consistent.

For subtree drains and drain_all, we already have the recursion in
bdrv_do_drained_begin(), so the extra recursion doesn't add anything
either.

Remove the useless code.
Signed-off-by: NKevin Wolf <kwolf@redhat.com>
Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>

d30b8e64

block: Really pause block jobs on drain · 89bd0305

由 Kevin Wolf 提交于 3月 22, 2018

We already requested that block jobs be paused in .bdrv_drained_begin,
but no guarantee was made that the job was actually inactive at the
point where bdrv_drained_begin() returned.

This introduces a new callback BdrvChildRole.bdrv_drained_poll() and
uses it to make bdrv_drain_poll() consider block jobs using the node to
be drained.

For the test case to work as expected, we have to switch from
block_job_sleep_ns() to qemu_co_sleep_ns() so that the test job is even
considered active and must be waited for when draining the node.
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

89bd0305