提交 · 760c4d43ae43f5d4b5eec450a53f056c3c91fab1 · openeuler / qemu

13 10月, 2017 2 次提交

block: rename bdrv_co_drain to bdrv_co_drain_begin · f8ea8dac

由 Manos Pitsidianakis 提交于 9月 23, 2017

Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: NFam Zheng <famz@redhat.com>
Signed-off-by: NManos Pitsidianakis <el13635@mail.ntua.gr>
Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>

f8ea8dac

block: add bdrv_co_drain_end callback · 481cad48

由 Manos Pitsidianakis 提交于 9月 23, 2017

BlockDriverState has a bdrv_co_drain() callback but no equivalent for
the end of the drain. The throttle driver (block/throttle.c) needs a way
to mark the end of the drain in order to toggle io_limits_disabled
correctly, thus bdrv_co_drain_end is needed.
Signed-off-by: NManos Pitsidianakis <el13635@mail.ntua.gr>
Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: NFam Zheng <famz@redhat.com>
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>

481cad48

06 10月, 2017 5 次提交

block: Perform copy-on-read in loop · cb2e2878

由 Eric Blake 提交于 10月 05, 2017

Improve our braindead copy-on-read implementation.  Pre-patch,
we have multiple issues:
- we create a bounce buffer and perform a write for the entire
request, even if the active image already has 99% of the
clusters occupied, and really only needs to copy-on-read the
remaining 1% of the clusters
- our bounce buffer was as large as the read request, and can
needlessly exhaust our memory by using double the memory of
the request size (the original request plus our bounce buffer),
rather than a capped maximum overhead beyond the original
- if a driver has a max_transfer limit, we are bypassing the
normal code in bdrv_aligned_preadv() that fragments to that
limit, and instead attempt to read the entire buffer from the
driver in one go, which some drivers may assert on
- a client can request a large request of nearly 2G such that
rounding the request out to cluster boundaries results in a
byte count larger than 2G.  While this cannot exceed 32 bits,
it DOES have some follow-on problems:
-- the call to bdrv_driver_pread() can assert for exceeding
BDRV_REQUEST_MAX_BYTES, if the driver is old and lacks
.bdrv_co_preadv
-- if the buffer is all zeroes, the subsequent call to
bdrv_co_do_pwrite_zeroes is a no-op due to a negative size,
which means we did not actually copy on read

Fix all of these issues by breaking up the action into a loop,
where each iteration is capped to sane limits.  Also, querying
the allocation status allows us to optimize: when data is
already present in the active layer, we don't need to bounce.

Note that the code has a telling comment that copy-on-read
should probably be a filter driver rather than a bolt-on hack
in io.c; but that remains a task for another day.

CC: qemu-stable@nongnu.org
Signed-off-by: NEric Blake <eblake@redhat.com>
Reviewed-by: NKevin Wolf <kwolf@redhat.com>
Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

cb2e2878

block: Add blkdebug hook for copy-on-read · d855ebcd

由 Eric Blake 提交于 10月 05, 2017

Make it possible to inject errors on writes performed during a
read operation due to copy-on-read semantics.
Signed-off-by: NEric Blake <eblake@redhat.com>
Reviewed-by: NJeff Cody <jcody@redhat.com>
Reviewed-by: NKevin Wolf <kwolf@redhat.com>
Reviewed-by: NJohn Snow <jsnow@redhat.com>
Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

d855ebcd

block: Uniform handling of 0-length bdrv_get_block_status() · 9cdcfd9f

由 Eric Blake 提交于 10月 05, 2017

Handle a 0-length block status request up front, with a uniform
return value claiming the area is not allocated.

Most callers don't pass a length of 0 to bdrv_get_block_status()
and friends; but it definitely happens with a 0-length read when
copy-on-read is enabled.  While we could audit all callers to
ensure that they never make a 0-length request, and then assert
that fact, it was just as easy to fix things to always report
success (as long as the callers are careful to not go into an
infinite loop).  However, we had inconsistent behavior on whether
the status is reported as allocated or defers to the backing
layer, depending on what callbacks the driver implements, and
possibly wasting quite a few CPU cycles to get to that answer.
Consistently reporting unallocated up front doesn't really hurt
anything, and makes it easier both for callers (0-length requests
now have well-defined behavior) and for drivers (drivers don't
have to deal with 0-length requests).
Signed-off-by: NEric Blake <eblake@redhat.com>
Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

9cdcfd9f

dirty-bitmap: Switch bdrv_set_dirty() to bytes · 0fdf1a4f

由 Eric Blake 提交于 9月 25, 2017

Both callers already had bytes available, but were scaling to
sectors.  Move the scaling to internal code.  In the case of
bdrv_aligned_pwritev(), we are now passing the exact offset
rather than a rounded sector-aligned value, but that's okay
as long as dirty bitmap widens start/bytes to granularity
boundaries.
Signed-off-by: NEric Blake <eblake@redhat.com>
Reviewed-by: NJohn Snow <jsnow@redhat.com>
Reviewed-by: NKevin Wolf <kwolf@redhat.com>
Reviewed-by: NFam Zheng <famz@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

0fdf1a4f

block: Typo fix in copy_on_readv() · 765d9df9

由 Eric Blake 提交于 9月 28, 2017

Signed-off-by: NEric Blake <eblake@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

765d9df9

05 9月, 2017 1 次提交

block: add default implementations for bdrv_co_get_block_status() · f7cc69b3

由 Manos Pitsidianakis 提交于 7月 13, 2017

bdrv_co_get_block_status_from_file() and
bdrv_co_get_block_status_from_backing() set *file to bs->file and
bs->backing respectively, so that bdrv_co_get_block_status() can recurse
to them. Future block drivers won't have to duplicate code to implement
this.
Reviewed-by: NFam Zheng <famz@redhat.com>
Reviewed-by: NEric Blake <eblake@redhat.com>
Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: NKevin Wolf <kwolf@redhat.com>
Signed-off-by: NManos Pitsidianakis <el13635@mail.ntua.gr>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

f7cc69b3

07 8月, 2017 1 次提交

block: move trace probes into bdrv_co_preadv|pwritev · f42cf447

由 Daniel P. Berrange 提交于 8月 04, 2017

There are trace probes in bdrv_co_readv|writev, however, the
block drivers are being gradually moved over to using the
bdrv_co_preadv|pwritev functions instead. As a result some
block drivers miss the current probes. Move the probes
into bdrv_co_preadv|pwritev instead, so that they are triggered
by more (all?) I/O code paths.
Signed-off-by: NDaniel P. Berrange <berrange@redhat.com>
Message-id: 20170804105036.11879-1-berrange@redhat.com
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>

f42cf447

17 7月, 2017 2 次提交

block: fix shadowed variable in bdrv_co_pdiscard · 593ed6f0

由 Denis V. Lunev 提交于 7月 10, 2017

We've had a shadowed 'ret' variable, which risks returning the wrong
value, introduced in commit b9c64947.
Signed-off-by: NDenis V. Lunev <den@openvz.org>
Reviewed-by: NFam Zheng <famz@redhat.com>
Reviewed-by: NEric Blake <eblake@redhat.com>
Message-id: 20170710150559.30163-1-den@openvz.org
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Eric Blake <eblake@redhat.com>
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>

593ed6f0

block: invoke .bdrv_drain callback in coroutine context and from AioContext · 61124f03

由 Paolo Bonzini 提交于 6月 29, 2017

This will let the callback take a CoMutex in the next patch.
Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: NFam Zheng <famz@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Message-Id: <20170629132749.997-8-pbonzini@redhat.com>
Signed-off-by: NFam Zheng <famz@redhat.com>

61124f03

11 7月, 2017 1 次提交

block/dirty-bitmap: add readonly field to BdrvDirtyBitmap · d6883bc9

由 Vladimir Sementsov-Ogievskiy 提交于 6月 28, 2017

It will be needed in following commits for persistent bitmaps.
If bitmap is loaded from read-only storage (and we can't mark it
"in use" in this storage) corresponding BdrvDirtyBitmap should be
read-only.
Signed-off-by: NVladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20170628120530.31251-11-vsementsov@virtuozzo.com
Signed-off-by: NMax Reitz <mreitz@redhat.com>

d6883bc9

10 7月, 2017 5 次提交

block: Make bdrv_is_allocated_above() byte-based · 51b0a488

由 Eric Blake 提交于 7月 07, 2017

We are gradually moving away from sector-based interfaces, towards
byte-based.  In the common case, allocation is unlikely to ever use
values that are not naturally sector-aligned, but it is possible
that byte-based values will let us be more precise about allocation
at the end of an unaligned file that can do byte-based access.

Changing the signature of the function to use int64_t *pnum ensures
that the compiler enforces that all callers are updated.  For now,
the io.c layer still assert()s that all callers are sector-aligned,
but that can be relaxed when a later patch implements byte-based
block status.  Therefore, for the most part this patch is just the
addition of scaling at the callers followed by inverse scaling at
bdrv_is_allocated().  But some code, particularly stream_run(),
gets a lot simpler because it no longer has to mess with sectors.
Leave comments where we can further simplify by switching to
byte-based iterations, once later patches eliminate the need for
sector-aligned operations.

For ease of review, bdrv_is_allocated() was tackled separately.
Signed-off-by: NEric Blake <eblake@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

51b0a488

block: Minimize raw use of bds->total_sectors · c00716be

由 Eric Blake 提交于 7月 07, 2017

bdrv_is_allocated_above() was relying on intermediate->total_sectors,
which is a field that can have stale contents depending on the value
of intermediate->has_variable_length.  An audit shows that we are safe
(we were first calling through bdrv_co_get_block_status() which in
turn calls bdrv_nb_sectors() and therefore just refreshed the current
length), but it's nicer to favor our accessor functions to avoid having
to repeat such an audit, even if it means refresh_total_sectors() is
called more frequently.
Suggested-by: NJohn Snow <jsnow@redhat.com>
Signed-off-by: NEric Blake <eblake@redhat.com>
Reviewed-by: NManos Pitsidianakis <el13635@mail.ntua.gr>
Reviewed-by: NJeff Cody <jcody@redhat.com>
Reviewed-by: NJohn Snow <jsnow@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

c00716be

block: Make bdrv_is_allocated() byte-based · d6a644bb

由 Eric Blake 提交于 7月 07, 2017

We are gradually moving away from sector-based interfaces, towards
byte-based.  In the common case, allocation is unlikely to ever use
values that are not naturally sector-aligned, but it is possible
that byte-based values will let us be more precise about allocation
at the end of an unaligned file that can do byte-based access.

Changing the signature of the function to use int64_t *pnum ensures
that the compiler enforces that all callers are updated.  For now,
the io.c layer still assert()s that all callers are sector-aligned
on input and that *pnum is sector-aligned on return to the caller,
but that can be relaxed when a later patch implements byte-based
block status.  Therefore, this code adds usages like
DIV_ROUND_UP(,BDRV_SECTOR_SIZE) to callers that still want aligned
values, where the call might reasonbly give non-aligned results
in the future; on the other hand, no rounding is needed for callers
that should just continue to work with byte alignment.

For the most part this patch is just the addition of scaling at the
callers followed by inverse scaling at bdrv_is_allocated().  But
some code, particularly bdrv_commit(), gets a lot simpler because it
no longer has to mess with sectors; also, it is now possible to pass
NULL if the caller does not care how much of the image is allocated
beyond the initial offset.  Leave comments where we can further
simplify once a later patch eliminates the need for sector-aligned
requests through bdrv_is_allocated().

For ease of review, bdrv_is_allocated_above() will be tackled
separately.
Signed-off-by: NEric Blake <eblake@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

d6a644bb

block: Drop unused bdrv_round_sectors_to_clusters() · e8a81e9c

由 Eric Blake 提交于 7月 07, 2017

Now that the last user [mirror_iteration()] has converted to using
bytes, we no longer need a function to round sectors to clusters.
Signed-off-by: NEric Blake <eblake@redhat.com>
Reviewed-by: NJohn Snow <jsnow@redhat.com>
Reviewed-by: NJeff Cody <jcody@redhat.com>
Reviewed-by: NKevin Wolf <kwolf@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

e8a81e9c

block: Guarantee that *file is set on bdrv_get_block_status() · 81c219ac

由 Eric Blake 提交于 6月 05, 2017

We document that *file is valid if the return is not an error and
includes BDRV_BLOCK_OFFSET_VALID, but forgot to obey this contract
when a driver (such as blkdebug) lacks a callback.  Messed up in
commit 67a0fd2a (v2.6), when we added the file parameter.

Enhance qemu-iotest 177 to cover this, using a sequence that would
print garbage or even SEGV, because it was dererefencing through
uninitialized memory.  [The resulting test output shows that we
have less-than-ideal block status from the blkdebug driver, but
that's a separate fix coming up soon.]

Setting *file on all paths that return BDRV_BLOCK_OFFSET_VALID is
enough to fix the crash, but we can go one step further: always
setting *file, even on error, means that a broken caller that
blindly dereferences file without checking for error is now more
likely to get a reliable SEGV instead of randomly acting on garbage,
making it easier to diagnose such buggy callers.  Adding an
assertion that file is set where expected doesn't hurt either.

CC: qemu-stable@nongnu.org
Signed-off-by: NEric Blake <eblake@redhat.com>
Reviewed-by: NFam Zheng <famz@redhat.com>
Reviewed-by: NMax Reitz <mreitz@redhat.com>
Reviewed-by: NJohn Snow <jsnow@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

81c219ac

30 6月, 2017 2 次提交

block: Exploit BDRV_BLOCK_EOF for larger zero blocks · c61e684e

由 Eric Blake 提交于 5月 04, 2017

When we have a BDS with unallocated clusters, but asking the status
of its underlying bs->file or backing layer encounters an end-of-file
condition, we know that the rest of the unallocated area will read as
zeroes. However, pre-patch, this required two separate calls to
bdrv_get_block_status(), as the first call stops at the point where
the underlying file ends. Thanks to BDRV_BLOCK_EOF, we can now widen
the results of the primary status if the secondary status already
includes BDRV_BLOCK_ZERO.

In turn, this fixes a TODO mentioned in iotest 154, where we can now
see that all sectors in a partial cluster at the end of a file read
as zero when coupling the shorter backing file's status along with our
knowledge that the remaining sectors came from an unallocated cluster.

Also, note that the loop in bdrv_co_get_block_status_above() had an
inefficent exit: in cases where the active layer sets BDRV_BLOCK_ZERO
but does NOT set BDRV_BLOCK_ALLOCATED (namely, where we know we read
zeroes merely because our unallocated clusters lie beyond the backing
file's shorter length), we still ended up probing the backing layer
even though we already had a good answer.
Signed-off-by: NEric Blake <eblake@redhat.com>
Message-Id: <20170505021500.19315-3-eblake@redhat.com>
Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: NFam Zheng <famz@redhat.com>

c61e684e

block: Add BDRV_BLOCK_EOF to bdrv_get_block_status() · fb0d8654

由 Eric Blake 提交于 5月 04, 2017

Just as the block layer already sets BDRV_BLOCK_ALLOCATED as a
shortcut for subsequent operations, there are also some optimizations
that are made easier if we can quickly tell that *pnum will advance
us to the end of a file, via a new BDRV_BLOCK_EOF which gets set
by the block layer.

This just plumbs up the new bit; subsequent patches will make use
of it.
Signed-off-by: NEric Blake <eblake@redhat.com>
Message-Id: <20170505021500.19315-2-eblake@redhat.com>
Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: NFam Zheng <famz@redhat.com>

fb0d8654

26 6月, 2017 4 次提交

block: change variable names in BlockDriverState · f5a5ca79

由 Manos Pitsidianakis 提交于 6月 09, 2017

Change the 'int count' parameter in *pwrite_zeros, *pdiscard related
functions (and some others) to 'int bytes', as they both refer to bytes.
This helps with code legibility.
Signed-off-by: NManos Pitsidianakis <el13635@mail.ntua.gr>
Message-id: 20170609101808.13506-1-el13635@mail.ntua.gr
Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: NMax Reitz <mreitz@redhat.com>

f5a5ca79

block: Remove bdrv_aio_readv/writev/flush() · c5f1ad42

由 Kevin Wolf 提交于 11月 18, 2016

These functions are unused now.
Signed-off-by: NKevin Wolf <kwolf@redhat.com>
Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>

c5f1ad42

block: use BDRV_POLL_WHILE() in bdrv_rw_vmstate() · ea17c9d2

由 Stefan Hajnoczi 提交于 5月 22, 2017

Calling aio_poll() directly may have been fine previously, but this is
the future, man!  The difference between an aio_poll() loop and
BDRV_POLL_WHILE() is that BDRV_POLL_WHILE() releases the AioContext
around aio_poll().

This allows the IOThread to run fd handlers or BHs to complete the
request.  Failure to release the AioContext causes deadlocks.

Using BDRV_POLL_WHILE() partially fixes a 'savevm' hang with -object
iothread.
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: NEric Blake <eblake@redhat.com>
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

ea17c9d2

block: count bdrv_co_rw_vmstate() requests · dc88a467

由 Stefan Hajnoczi 提交于 5月 22, 2017

Call bdrv_inc/dec_in_flight() for vmstate reads/writes.  This seems
unnecessary at first glance because vmstate reads/writes are done
synchronously while the guest is stopped.  But we need the bdrv_wakeup()
in bdrv_dec_in_flight() so the main loop sees request completion.
Besides, it's cleaner to count vmstate reads/writes like ordinary
read/write requests.

The bdrv_wakeup() partially fixes a 'savevm' hang with -object iothread.
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: NEric Blake <eblake@redhat.com>
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

dc88a467

16 6月, 2017 8 次提交

block: protect tracked_requests and flush_queue with reqs_lock · 3783fa3d

由 Paolo Bonzini 提交于 6月 05, 2017

Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Message-Id: <20170605123908.18777-14-pbonzini@redhat.com>
Signed-off-by: NFam Zheng <famz@redhat.com>

3783fa3d

block: access write_gen with atomics · 47fec599

由 Paolo Bonzini 提交于 6月 05, 2017

Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Message-Id: <20170605123908.18777-13-pbonzini@redhat.com>
Signed-off-by: NFam Zheng <famz@redhat.com>

47fec599

block: use Stat64 for wr_highest_offset · f7946da2

由 Paolo Bonzini 提交于 6月 05, 2017

Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Message-Id: <20170605123908.18777-12-pbonzini@redhat.com>
Signed-off-by: NFam Zheng <famz@redhat.com>

f7946da2

block: access io_plugged with atomic ops · 850d54a2

由 Paolo Bonzini 提交于 6月 05, 2017

Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Message-Id: <20170605123908.18777-7-pbonzini@redhat.com>
Signed-off-by: NFam Zheng <famz@redhat.com>

850d54a2

block: access wakeup with atomic ops · e2a6ae7f

由 Paolo Bonzini 提交于 6月 05, 2017

Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Message-Id: <20170605123908.18777-6-pbonzini@redhat.com>
Signed-off-by: NFam Zheng <famz@redhat.com>

e2a6ae7f

block: access serialising_in_flight with atomic ops · 20fc71b2

由 Paolo Bonzini 提交于 6月 05, 2017

Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Message-Id: <20170605123908.18777-5-pbonzini@redhat.com>
Signed-off-by: NFam Zheng <famz@redhat.com>

20fc71b2

block: access quiesce_counter with atomic ops · 414c2ec3

由 Paolo Bonzini 提交于 6月 05, 2017

Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: NAlberto Garcia <berto@igalia.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Message-Id: <20170605123908.18777-3-pbonzini@redhat.com>
Signed-off-by: NFam Zheng <famz@redhat.com>

414c2ec3

block: access copy_on_read with atomic ops · d3faa13e

由 Paolo Bonzini 提交于 6月 05, 2017

Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Message-Id: <20170605123908.18777-2-pbonzini@redhat.com>
Signed-off-by: NFam Zheng <famz@redhat.com>

d3faa13e

25 5月, 2017 1 次提交

blockjob: introduce block_job_pause/resume_all · f321dcb5

由 Paolo Bonzini 提交于 5月 08, 2017

Remove use of block_job_pause/resume from outside blockjob.c, thus
making them static.  The new functions are used by the block layer,
so place them in blockjob_int.h.
Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: NJohn Snow <jsnow@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Reviewed-by: NJeff Cody <jcody@redhat.com>
Message-id: 20170508141310.8674-5-pbonzini@redhat.com
Signed-off-by: NJeff Cody <jcody@redhat.com>

f321dcb5

12 5月, 2017 1 次提交

block: Simplify BDRV_BLOCK_RAW recursion · ee29d6ad

由 Eric Blake 提交于 5月 04, 2017

Since we are already in coroutine context during the body of
bdrv_co_get_block_status(), we can shave off a few layers of
wrappers when recursing to query the protocol when a format driver
returned BDRV_BLOCK_RAW.

Note that we are already using the correct recursion later on in
the same function, when probing whether the protocol layer is sparse
in order to find out if we can add BDRV_BLOCK_ZERO to an existing
BDRV_BLOCK_DATA|BDRV_BLOCK_OFFSET_VALID.
Signed-off-by: NEric Blake <eblake@redhat.com>
Reviewed-by: NMax Reitz <mreitz@redhat.com>
Reviewed-by: NFam Zheng <famz@redhat.com>
Message-id: 20170504173745.27414-1-eblake@redhat.com
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>

ee29d6ad

27 4月, 2017 3 次提交

block: fix alignment calculations in bdrv_co_do_zero_pwritev · f13ce1be

由 Denis V. Lunev 提交于 4月 26, 2017

tail_padding_bytes is calculated wrong. F.e. for
    offset = 0
    bytes = 2048
    align = 512
we will have tail_padding_bytes = 512 which is definitely wrong. The patch
fixes that arithmetics.

Fortunately this problem is harmless, we will have 1 extra allocation and
free thus there is no need to put this into stable. The problem is here
from the very beginning.
Signed-off-by: NDenis V. Lunev <den@openvz.org>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Fam Zheng <famz@redhat.com>
Reviewed-by: NEric Blake <eblake@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

f13ce1be

block: Remove NULL check in bdrv_co_flush · e914404e

由 Fam Zheng 提交于 4月 26, 2017

Reported by Coverity. We already use bs in bdrv_inc_in_flight before
checking for NULL. It is unnecessary as all callers pass non-NULL bs, so
drop it.
Signed-off-by: NFam Zheng <famz@redhat.com>
Reviewed-by: NMax Reitz <mreitz@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

e914404e

Revert "block/io: Comment out permission assertions" · 362b3786

由 Max Reitz 提交于 4月 11, 2017

This reverts commit e3e0003a.

This commit was necessary for the 2.9 release because we were unable to
fix the underlying issue(s) in time. However, we will be for 2.10.
Signed-off-by: NMax Reitz <mreitz@redhat.com>
Acked-by: NFam Zheng <famz@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

362b3786

18 4月, 2017 1 次提交

block: Walk bs->children carefully in bdrv_drain_recurse · 178bd438

由 Fam Zheng 提交于 4月 18, 2017

The recursive bdrv_drain_recurse may run a block job completion BH that
drops nodes. The coming changes will make that more likely and use-after-free
would happen without this patch

Stash the bs pointer and use bdrv_ref/bdrv_unref in addition to
QLIST_FOREACH_SAFE to prevent such a case from happening.

Since bdrv_unref accesses global state that is not protected by the AioContext
lock, we cannot use bdrv_ref/bdrv_unref unconditionally.  Fortunately the
protection is not needed in IOThread because only main loop can modify a graph
with the AioContext lock held.
Signed-off-by: NFam Zheng <famz@redhat.com>
Message-Id: <20170418143044.12187-2-famz@redhat.com>
Reviewed-by: NJeff Cody <jcody@redhat.com>
Tested-by: NJeff Cody <jcody@redhat.com>
Signed-off-by: NFam Zheng <famz@redhat.com>

178bd438

11 4月, 2017 3 次提交

block/io: Comment out permission assertions · e3e0003a

由 Max Reitz 提交于 4月 11, 2017

In case of block migration, there may be writes to BlockBackends that do
not have the write permission taken. Before this issue is fixed (which
is not going to happen in 2.9), we therefore cannot assert that this is
the case.
Suggested-by: NKevin Wolf <kwolf@redhat.com>
Signed-off-by: NMax Reitz <mreitz@redhat.com>
Reviewed-by: NKevin Wolf <kwolf@redhat.com>
Tested-by: NKevin Wolf <kwolf@redhat.com>
Message-id: 20170411145050.31290-1-mreitz@redhat.com
Tested-by: NLaurent Vivier <lvivier@redhat.com>
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>

e3e0003a

block: Fix bdrv_co_flush early return · 49ca6259

由 Fam Zheng 提交于 4月 10, 2017

bdrv_inc_in_flight and bdrv_dec_in_flight are mandatory for
BDRV_POLL_WHILE to work, even for the shortcut case where flush is
unnecessary. Move the if block to below bdrv_dec_in_flight, and BTW fix
the variable declaration position.
Signed-off-by: NFam Zheng <famz@redhat.com>
Acked-by: NStefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: NKevin Wolf <kwolf@redhat.com>
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>

49ca6259

block: Use bdrv_coroutine_enter to start I/O coroutines · e92f0e19

由 Fam Zheng 提交于 4月 10, 2017

BDRV_POLL_WHILE waits for the started I/O by releasing bs's ctx then polling
the main context, which relies on the yielded coroutine continuing on bs->ctx
before notifying qemu_aio_context with bdrv_wakeup().

Thus, using qemu_coroutine_enter to start I/O is wrong because if the coroutine
is entered from main loop, co->ctx will be qemu_aio_context, as a result of the
"release, poll, acquire" loop of BDRV_POLL_WHILE, race conditions happen when
both main thread and the iothread access the same BDS:

  main loop                                iothread
-----------------------------------------------------------------------
  blockdev_snapshot
    aio_context_acquire(bs->ctx)
                                           virtio_scsi_data_plane_handle_cmd
    bdrv_drained_begin(bs->ctx)
    bdrv_flush(bs)
      bdrv_co_flush(bs)                      aio_context_acquire(bs->ctx).enter
        ...
        qemu_coroutine_yield(co)
      BDRV_POLL_WHILE()
        aio_context_release(bs->ctx)
                                             aio_context_acquire(bs->ctx).return
                                               ...
                                                 aio_co_wake(co)
        aio_poll(qemu_aio_context)               ...
          co_schedule_bh_cb()                    ...
            qemu_coroutine_enter(co)             ...

              /* (A) bdrv_co_flush(bs)           /* (B) I/O on bs */
                      continues... */
                                             aio_context_release(bs->ctx)
        aio_context_acquire(bs->ctx)

Note that in above case, bdrv_drained_begin() doesn't do the "release,
poll, acquire" in BDRV_POLL_WHILE, because bs->in_flight == 0.

Fix this by using bdrv_coroutine_enter and enter coroutine in the right
context.

iotests 109 output is updated because the coroutine reenter flow during
mirror job complete is different (now through co_queue_wakeup, instead
of the unconditional qemu_coroutine_switch before), making the end job
len different.
Signed-off-by: NFam Zheng <famz@redhat.com>
Acked-by: NStefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: NKevin Wolf <kwolf@redhat.com>

e92f0e19