- 13 10月, 2017 2 次提交
-
-
由 Manos Pitsidianakis 提交于
Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com> Reviewed-by: NFam Zheng <famz@redhat.com> Signed-off-by: NManos Pitsidianakis <el13635@mail.ntua.gr> Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com> Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
-
由 Manos Pitsidianakis 提交于
BlockDriverState has a bdrv_co_drain() callback but no equivalent for the end of the drain. The throttle driver (block/throttle.c) needs a way to mark the end of the drain in order to toggle io_limits_disabled correctly, thus bdrv_co_drain_end is needed. Signed-off-by: NManos Pitsidianakis <el13635@mail.ntua.gr> Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com> Reviewed-by: NFam Zheng <famz@redhat.com> Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
-
- 06 10月, 2017 5 次提交
-
-
由 Eric Blake 提交于
Improve our braindead copy-on-read implementation. Pre-patch, we have multiple issues: - we create a bounce buffer and perform a write for the entire request, even if the active image already has 99% of the clusters occupied, and really only needs to copy-on-read the remaining 1% of the clusters - our bounce buffer was as large as the read request, and can needlessly exhaust our memory by using double the memory of the request size (the original request plus our bounce buffer), rather than a capped maximum overhead beyond the original - if a driver has a max_transfer limit, we are bypassing the normal code in bdrv_aligned_preadv() that fragments to that limit, and instead attempt to read the entire buffer from the driver in one go, which some drivers may assert on - a client can request a large request of nearly 2G such that rounding the request out to cluster boundaries results in a byte count larger than 2G. While this cannot exceed 32 bits, it DOES have some follow-on problems: -- the call to bdrv_driver_pread() can assert for exceeding BDRV_REQUEST_MAX_BYTES, if the driver is old and lacks .bdrv_co_preadv -- if the buffer is all zeroes, the subsequent call to bdrv_co_do_pwrite_zeroes is a no-op due to a negative size, which means we did not actually copy on read Fix all of these issues by breaking up the action into a loop, where each iteration is capped to sane limits. Also, querying the allocation status allows us to optimize: when data is already present in the active layer, we don't need to bounce. Note that the code has a telling comment that copy-on-read should probably be a filter driver rather than a bolt-on hack in io.c; but that remains a task for another day. CC: qemu-stable@nongnu.org Signed-off-by: NEric Blake <eblake@redhat.com> Reviewed-by: NKevin Wolf <kwolf@redhat.com> Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com> Signed-off-by: NKevin Wolf <kwolf@redhat.com>
-
由 Eric Blake 提交于
Make it possible to inject errors on writes performed during a read operation due to copy-on-read semantics. Signed-off-by: NEric Blake <eblake@redhat.com> Reviewed-by: NJeff Cody <jcody@redhat.com> Reviewed-by: NKevin Wolf <kwolf@redhat.com> Reviewed-by: NJohn Snow <jsnow@redhat.com> Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com> Signed-off-by: NKevin Wolf <kwolf@redhat.com>
-
由 Eric Blake 提交于
Handle a 0-length block status request up front, with a uniform return value claiming the area is not allocated. Most callers don't pass a length of 0 to bdrv_get_block_status() and friends; but it definitely happens with a 0-length read when copy-on-read is enabled. While we could audit all callers to ensure that they never make a 0-length request, and then assert that fact, it was just as easy to fix things to always report success (as long as the callers are careful to not go into an infinite loop). However, we had inconsistent behavior on whether the status is reported as allocated or defers to the backing layer, depending on what callbacks the driver implements, and possibly wasting quite a few CPU cycles to get to that answer. Consistently reporting unallocated up front doesn't really hurt anything, and makes it easier both for callers (0-length requests now have well-defined behavior) and for drivers (drivers don't have to deal with 0-length requests). Signed-off-by: NEric Blake <eblake@redhat.com> Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com> Signed-off-by: NKevin Wolf <kwolf@redhat.com>
-
由 Eric Blake 提交于
Both callers already had bytes available, but were scaling to sectors. Move the scaling to internal code. In the case of bdrv_aligned_pwritev(), we are now passing the exact offset rather than a rounded sector-aligned value, but that's okay as long as dirty bitmap widens start/bytes to granularity boundaries. Signed-off-by: NEric Blake <eblake@redhat.com> Reviewed-by: NJohn Snow <jsnow@redhat.com> Reviewed-by: NKevin Wolf <kwolf@redhat.com> Reviewed-by: NFam Zheng <famz@redhat.com> Signed-off-by: NKevin Wolf <kwolf@redhat.com>
-
由 Eric Blake 提交于
Signed-off-by: NEric Blake <eblake@redhat.com> Signed-off-by: NKevin Wolf <kwolf@redhat.com>
-
- 05 9月, 2017 1 次提交
-
-
由 Manos Pitsidianakis 提交于
bdrv_co_get_block_status_from_file() and bdrv_co_get_block_status_from_backing() set *file to bs->file and bs->backing respectively, so that bdrv_co_get_block_status() can recurse to them. Future block drivers won't have to duplicate code to implement this. Reviewed-by: NFam Zheng <famz@redhat.com> Reviewed-by: NEric Blake <eblake@redhat.com> Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com> Reviewed-by: NKevin Wolf <kwolf@redhat.com> Signed-off-by: NManos Pitsidianakis <el13635@mail.ntua.gr> Signed-off-by: NKevin Wolf <kwolf@redhat.com>
-
- 07 8月, 2017 1 次提交
-
-
由 Daniel P. Berrange 提交于
There are trace probes in bdrv_co_readv|writev, however, the block drivers are being gradually moved over to using the bdrv_co_preadv|pwritev functions instead. As a result some block drivers miss the current probes. Move the probes into bdrv_co_preadv|pwritev instead, so that they are triggered by more (all?) I/O code paths. Signed-off-by: NDaniel P. Berrange <berrange@redhat.com> Message-id: 20170804105036.11879-1-berrange@redhat.com Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
-
- 17 7月, 2017 2 次提交
-
-
由 Denis V. Lunev 提交于
We've had a shadowed 'ret' variable, which risks returning the wrong value, introduced in commit b9c64947. Signed-off-by: NDenis V. Lunev <den@openvz.org> Reviewed-by: NFam Zheng <famz@redhat.com> Reviewed-by: NEric Blake <eblake@redhat.com> Message-id: 20170710150559.30163-1-den@openvz.org CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Kevin Wolf <kwolf@redhat.com> CC: Eric Blake <eblake@redhat.com> Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
-
由 Paolo Bonzini 提交于
This will let the callback take a CoMutex in the next patch. Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com> Reviewed-by: NFam Zheng <famz@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Message-Id: <20170629132749.997-8-pbonzini@redhat.com> Signed-off-by: NFam Zheng <famz@redhat.com>
-
- 11 7月, 2017 1 次提交
-
-
It will be needed in following commits for persistent bitmaps. If bitmap is loaded from read-only storage (and we can't mark it "in use" in this storage) corresponding BdrvDirtyBitmap should be read-only. Signed-off-by: NVladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20170628120530.31251-11-vsementsov@virtuozzo.com Signed-off-by: NMax Reitz <mreitz@redhat.com>
-
- 10 7月, 2017 5 次提交
-
-
由 Eric Blake 提交于
We are gradually moving away from sector-based interfaces, towards byte-based. In the common case, allocation is unlikely to ever use values that are not naturally sector-aligned, but it is possible that byte-based values will let us be more precise about allocation at the end of an unaligned file that can do byte-based access. Changing the signature of the function to use int64_t *pnum ensures that the compiler enforces that all callers are updated. For now, the io.c layer still assert()s that all callers are sector-aligned, but that can be relaxed when a later patch implements byte-based block status. Therefore, for the most part this patch is just the addition of scaling at the callers followed by inverse scaling at bdrv_is_allocated(). But some code, particularly stream_run(), gets a lot simpler because it no longer has to mess with sectors. Leave comments where we can further simplify by switching to byte-based iterations, once later patches eliminate the need for sector-aligned operations. For ease of review, bdrv_is_allocated() was tackled separately. Signed-off-by: NEric Blake <eblake@redhat.com> Signed-off-by: NKevin Wolf <kwolf@redhat.com>
-
由 Eric Blake 提交于
bdrv_is_allocated_above() was relying on intermediate->total_sectors, which is a field that can have stale contents depending on the value of intermediate->has_variable_length. An audit shows that we are safe (we were first calling through bdrv_co_get_block_status() which in turn calls bdrv_nb_sectors() and therefore just refreshed the current length), but it's nicer to favor our accessor functions to avoid having to repeat such an audit, even if it means refresh_total_sectors() is called more frequently. Suggested-by: NJohn Snow <jsnow@redhat.com> Signed-off-by: NEric Blake <eblake@redhat.com> Reviewed-by: NManos Pitsidianakis <el13635@mail.ntua.gr> Reviewed-by: NJeff Cody <jcody@redhat.com> Reviewed-by: NJohn Snow <jsnow@redhat.com> Signed-off-by: NKevin Wolf <kwolf@redhat.com>
-
由 Eric Blake 提交于
We are gradually moving away from sector-based interfaces, towards byte-based. In the common case, allocation is unlikely to ever use values that are not naturally sector-aligned, but it is possible that byte-based values will let us be more precise about allocation at the end of an unaligned file that can do byte-based access. Changing the signature of the function to use int64_t *pnum ensures that the compiler enforces that all callers are updated. For now, the io.c layer still assert()s that all callers are sector-aligned on input and that *pnum is sector-aligned on return to the caller, but that can be relaxed when a later patch implements byte-based block status. Therefore, this code adds usages like DIV_ROUND_UP(,BDRV_SECTOR_SIZE) to callers that still want aligned values, where the call might reasonbly give non-aligned results in the future; on the other hand, no rounding is needed for callers that should just continue to work with byte alignment. For the most part this patch is just the addition of scaling at the callers followed by inverse scaling at bdrv_is_allocated(). But some code, particularly bdrv_commit(), gets a lot simpler because it no longer has to mess with sectors; also, it is now possible to pass NULL if the caller does not care how much of the image is allocated beyond the initial offset. Leave comments where we can further simplify once a later patch eliminates the need for sector-aligned requests through bdrv_is_allocated(). For ease of review, bdrv_is_allocated_above() will be tackled separately. Signed-off-by: NEric Blake <eblake@redhat.com> Signed-off-by: NKevin Wolf <kwolf@redhat.com>
-
由 Eric Blake 提交于
Now that the last user [mirror_iteration()] has converted to using bytes, we no longer need a function to round sectors to clusters. Signed-off-by: NEric Blake <eblake@redhat.com> Reviewed-by: NJohn Snow <jsnow@redhat.com> Reviewed-by: NJeff Cody <jcody@redhat.com> Reviewed-by: NKevin Wolf <kwolf@redhat.com> Signed-off-by: NKevin Wolf <kwolf@redhat.com>
-
由 Eric Blake 提交于
We document that *file is valid if the return is not an error and includes BDRV_BLOCK_OFFSET_VALID, but forgot to obey this contract when a driver (such as blkdebug) lacks a callback. Messed up in commit 67a0fd2a (v2.6), when we added the file parameter. Enhance qemu-iotest 177 to cover this, using a sequence that would print garbage or even SEGV, because it was dererefencing through uninitialized memory. [The resulting test output shows that we have less-than-ideal block status from the blkdebug driver, but that's a separate fix coming up soon.] Setting *file on all paths that return BDRV_BLOCK_OFFSET_VALID is enough to fix the crash, but we can go one step further: always setting *file, even on error, means that a broken caller that blindly dereferences file without checking for error is now more likely to get a reliable SEGV instead of randomly acting on garbage, making it easier to diagnose such buggy callers. Adding an assertion that file is set where expected doesn't hurt either. CC: qemu-stable@nongnu.org Signed-off-by: NEric Blake <eblake@redhat.com> Reviewed-by: NFam Zheng <famz@redhat.com> Reviewed-by: NMax Reitz <mreitz@redhat.com> Reviewed-by: NJohn Snow <jsnow@redhat.com> Signed-off-by: NKevin Wolf <kwolf@redhat.com>
-
- 30 6月, 2017 2 次提交
-
-
由 Eric Blake 提交于
When we have a BDS with unallocated clusters, but asking the status of its underlying bs->file or backing layer encounters an end-of-file condition, we know that the rest of the unallocated area will read as zeroes. However, pre-patch, this required two separate calls to bdrv_get_block_status(), as the first call stops at the point where the underlying file ends. Thanks to BDRV_BLOCK_EOF, we can now widen the results of the primary status if the secondary status already includes BDRV_BLOCK_ZERO. In turn, this fixes a TODO mentioned in iotest 154, where we can now see that all sectors in a partial cluster at the end of a file read as zero when coupling the shorter backing file's status along with our knowledge that the remaining sectors came from an unallocated cluster. Also, note that the loop in bdrv_co_get_block_status_above() had an inefficent exit: in cases where the active layer sets BDRV_BLOCK_ZERO but does NOT set BDRV_BLOCK_ALLOCATED (namely, where we know we read zeroes merely because our unallocated clusters lie beyond the backing file's shorter length), we still ended up probing the backing layer even though we already had a good answer. Signed-off-by: NEric Blake <eblake@redhat.com> Message-Id: <20170505021500.19315-3-eblake@redhat.com> Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com> Signed-off-by: NFam Zheng <famz@redhat.com>
-
由 Eric Blake 提交于
Just as the block layer already sets BDRV_BLOCK_ALLOCATED as a shortcut for subsequent operations, there are also some optimizations that are made easier if we can quickly tell that *pnum will advance us to the end of a file, via a new BDRV_BLOCK_EOF which gets set by the block layer. This just plumbs up the new bit; subsequent patches will make use of it. Signed-off-by: NEric Blake <eblake@redhat.com> Message-Id: <20170505021500.19315-2-eblake@redhat.com> Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com> Signed-off-by: NFam Zheng <famz@redhat.com>
-
- 26 6月, 2017 4 次提交
-
-
由 Manos Pitsidianakis 提交于
Change the 'int count' parameter in *pwrite_zeros, *pdiscard related functions (and some others) to 'int bytes', as they both refer to bytes. This helps with code legibility. Signed-off-by: NManos Pitsidianakis <el13635@mail.ntua.gr> Message-id: 20170609101808.13506-1-el13635@mail.ntua.gr Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com> Signed-off-by: NMax Reitz <mreitz@redhat.com>
-
由 Kevin Wolf 提交于
These functions are unused now. Signed-off-by: NKevin Wolf <kwolf@redhat.com> Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
-
由 Stefan Hajnoczi 提交于
Calling aio_poll() directly may have been fine previously, but this is the future, man! The difference between an aio_poll() loop and BDRV_POLL_WHILE() is that BDRV_POLL_WHILE() releases the AioContext around aio_poll(). This allows the IOThread to run fd handlers or BHs to complete the request. Failure to release the AioContext causes deadlocks. Using BDRV_POLL_WHILE() partially fixes a 'savevm' hang with -object iothread. Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com> Reviewed-by: NEric Blake <eblake@redhat.com> Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NKevin Wolf <kwolf@redhat.com>
-
由 Stefan Hajnoczi 提交于
Call bdrv_inc/dec_in_flight() for vmstate reads/writes. This seems unnecessary at first glance because vmstate reads/writes are done synchronously while the guest is stopped. But we need the bdrv_wakeup() in bdrv_dec_in_flight() so the main loop sees request completion. Besides, it's cleaner to count vmstate reads/writes like ordinary read/write requests. The bdrv_wakeup() partially fixes a 'savevm' hang with -object iothread. Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com> Reviewed-by: NEric Blake <eblake@redhat.com> Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NKevin Wolf <kwolf@redhat.com>
-
- 16 6月, 2017 8 次提交
-
-
由 Paolo Bonzini 提交于
Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Message-Id: <20170605123908.18777-14-pbonzini@redhat.com> Signed-off-by: NFam Zheng <famz@redhat.com>
-
由 Paolo Bonzini 提交于
Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Message-Id: <20170605123908.18777-13-pbonzini@redhat.com> Signed-off-by: NFam Zheng <famz@redhat.com>
-
由 Paolo Bonzini 提交于
Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Message-Id: <20170605123908.18777-12-pbonzini@redhat.com> Signed-off-by: NFam Zheng <famz@redhat.com>
-
由 Paolo Bonzini 提交于
Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Message-Id: <20170605123908.18777-7-pbonzini@redhat.com> Signed-off-by: NFam Zheng <famz@redhat.com>
-
由 Paolo Bonzini 提交于
Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Message-Id: <20170605123908.18777-6-pbonzini@redhat.com> Signed-off-by: NFam Zheng <famz@redhat.com>
-
由 Paolo Bonzini 提交于
Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Message-Id: <20170605123908.18777-5-pbonzini@redhat.com> Signed-off-by: NFam Zheng <famz@redhat.com>
-
由 Paolo Bonzini 提交于
Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com> Reviewed-by: NAlberto Garcia <berto@igalia.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Message-Id: <20170605123908.18777-3-pbonzini@redhat.com> Signed-off-by: NFam Zheng <famz@redhat.com>
-
由 Paolo Bonzini 提交于
Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Message-Id: <20170605123908.18777-2-pbonzini@redhat.com> Signed-off-by: NFam Zheng <famz@redhat.com>
-
- 25 5月, 2017 1 次提交
-
-
由 Paolo Bonzini 提交于
Remove use of block_job_pause/resume from outside blockjob.c, thus making them static. The new functions are used by the block layer, so place them in blockjob_int.h. Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com> Reviewed-by: NJohn Snow <jsnow@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Reviewed-by: NJeff Cody <jcody@redhat.com> Message-id: 20170508141310.8674-5-pbonzini@redhat.com Signed-off-by: NJeff Cody <jcody@redhat.com>
-
- 12 5月, 2017 1 次提交
-
-
由 Eric Blake 提交于
Since we are already in coroutine context during the body of bdrv_co_get_block_status(), we can shave off a few layers of wrappers when recursing to query the protocol when a format driver returned BDRV_BLOCK_RAW. Note that we are already using the correct recursion later on in the same function, when probing whether the protocol layer is sparse in order to find out if we can add BDRV_BLOCK_ZERO to an existing BDRV_BLOCK_DATA|BDRV_BLOCK_OFFSET_VALID. Signed-off-by: NEric Blake <eblake@redhat.com> Reviewed-by: NMax Reitz <mreitz@redhat.com> Reviewed-by: NFam Zheng <famz@redhat.com> Message-id: 20170504173745.27414-1-eblake@redhat.com Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
-
- 27 4月, 2017 3 次提交
-
-
由 Denis V. Lunev 提交于
tail_padding_bytes is calculated wrong. F.e. for offset = 0 bytes = 2048 align = 512 we will have tail_padding_bytes = 512 which is definitely wrong. The patch fixes that arithmetics. Fortunately this problem is harmless, we will have 1 extra allocation and free thus there is no need to put this into stable. The problem is here from the very beginning. Signed-off-by: NDenis V. Lunev <den@openvz.org> CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Fam Zheng <famz@redhat.com> Reviewed-by: NEric Blake <eblake@redhat.com> Signed-off-by: NKevin Wolf <kwolf@redhat.com>
-
由 Fam Zheng 提交于
Reported by Coverity. We already use bs in bdrv_inc_in_flight before checking for NULL. It is unnecessary as all callers pass non-NULL bs, so drop it. Signed-off-by: NFam Zheng <famz@redhat.com> Reviewed-by: NMax Reitz <mreitz@redhat.com> Signed-off-by: NKevin Wolf <kwolf@redhat.com>
-
由 Max Reitz 提交于
This reverts commit e3e0003a. This commit was necessary for the 2.9 release because we were unable to fix the underlying issue(s) in time. However, we will be for 2.10. Signed-off-by: NMax Reitz <mreitz@redhat.com> Acked-by: NFam Zheng <famz@redhat.com> Signed-off-by: NKevin Wolf <kwolf@redhat.com>
-
- 18 4月, 2017 1 次提交
-
-
由 Fam Zheng 提交于
The recursive bdrv_drain_recurse may run a block job completion BH that drops nodes. The coming changes will make that more likely and use-after-free would happen without this patch Stash the bs pointer and use bdrv_ref/bdrv_unref in addition to QLIST_FOREACH_SAFE to prevent such a case from happening. Since bdrv_unref accesses global state that is not protected by the AioContext lock, we cannot use bdrv_ref/bdrv_unref unconditionally. Fortunately the protection is not needed in IOThread because only main loop can modify a graph with the AioContext lock held. Signed-off-by: NFam Zheng <famz@redhat.com> Message-Id: <20170418143044.12187-2-famz@redhat.com> Reviewed-by: NJeff Cody <jcody@redhat.com> Tested-by: NJeff Cody <jcody@redhat.com> Signed-off-by: NFam Zheng <famz@redhat.com>
-
- 11 4月, 2017 3 次提交
-
-
由 Max Reitz 提交于
In case of block migration, there may be writes to BlockBackends that do not have the write permission taken. Before this issue is fixed (which is not going to happen in 2.9), we therefore cannot assert that this is the case. Suggested-by: NKevin Wolf <kwolf@redhat.com> Signed-off-by: NMax Reitz <mreitz@redhat.com> Reviewed-by: NKevin Wolf <kwolf@redhat.com> Tested-by: NKevin Wolf <kwolf@redhat.com> Message-id: 20170411145050.31290-1-mreitz@redhat.com Tested-by: NLaurent Vivier <lvivier@redhat.com> Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
-
由 Fam Zheng 提交于
bdrv_inc_in_flight and bdrv_dec_in_flight are mandatory for BDRV_POLL_WHILE to work, even for the shortcut case where flush is unnecessary. Move the if block to below bdrv_dec_in_flight, and BTW fix the variable declaration position. Signed-off-by: NFam Zheng <famz@redhat.com> Acked-by: NStefan Hajnoczi <stefanha@redhat.com> Reviewed-by: NKevin Wolf <kwolf@redhat.com> Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Fam Zheng 提交于
BDRV_POLL_WHILE waits for the started I/O by releasing bs's ctx then polling the main context, which relies on the yielded coroutine continuing on bs->ctx before notifying qemu_aio_context with bdrv_wakeup(). Thus, using qemu_coroutine_enter to start I/O is wrong because if the coroutine is entered from main loop, co->ctx will be qemu_aio_context, as a result of the "release, poll, acquire" loop of BDRV_POLL_WHILE, race conditions happen when both main thread and the iothread access the same BDS: main loop iothread ----------------------------------------------------------------------- blockdev_snapshot aio_context_acquire(bs->ctx) virtio_scsi_data_plane_handle_cmd bdrv_drained_begin(bs->ctx) bdrv_flush(bs) bdrv_co_flush(bs) aio_context_acquire(bs->ctx).enter ... qemu_coroutine_yield(co) BDRV_POLL_WHILE() aio_context_release(bs->ctx) aio_context_acquire(bs->ctx).return ... aio_co_wake(co) aio_poll(qemu_aio_context) ... co_schedule_bh_cb() ... qemu_coroutine_enter(co) ... /* (A) bdrv_co_flush(bs) /* (B) I/O on bs */ continues... */ aio_context_release(bs->ctx) aio_context_acquire(bs->ctx) Note that in above case, bdrv_drained_begin() doesn't do the "release, poll, acquire" in BDRV_POLL_WHILE, because bs->in_flight == 0. Fix this by using bdrv_coroutine_enter and enter coroutine in the right context. iotests 109 output is updated because the coroutine reenter flow during mirror job complete is different (now through co_queue_wakeup, instead of the unconditional qemu_coroutine_switch before), making the end job len different. Signed-off-by: NFam Zheng <famz@redhat.com> Acked-by: NStefan Hajnoczi <stefanha@redhat.com> Reviewed-by: NKevin Wolf <kwolf@redhat.com>
-