- 24 10月, 2012 2 次提交
-
-
由 Paolo Bonzini 提交于
Error management is important for mirroring; otherwise, an error on the target (even something as "innocent" as ENOSPC) requires to start again with a full copy. Similar to on_read_error/on_write_error, two separate knobs are provided for on_source_error (reads) and on_target_error (writes). The default is 'report' for both. The 'ignore' policy will leave the sector dirty, so that it will be retried later. Thus, it will not cause corruption. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NKevin Wolf <kwolf@redhat.com>
-
由 Paolo Bonzini 提交于
This patch adds the implementation of a new job that mirrors a disk to a new image while letting the guest continue using the old image. The target is treated as a "black box" and data is copied from the source to the target in the background. This can be used for several purposes, including storage migration, continuous replication, and observation of the guest I/O in an external program. It is also a first step in replacing the inefficient block migration code that is part of QEMU. The job is possibly never-ending, but it is logically structured into two phases: 1) copy all data as fast as possible until the target first gets in sync with the source; 2) keep target in sync and ensure that reopening to the target gets a correct (full) copy of the source data. The second phase is indicated by the progress in "info block-jobs" reporting the current offset to be equal to the length of the file. When the job is cancelled in the second phase, QEMU will run the job until the source is clean and quiescent, then it will report successful completion of the job. In other words, the BLOCK_JOB_CANCELLED event means that the target may _not_ be consistent with a past state of the source; the BLOCK_JOB_COMPLETED event means that the target is consistent with a past state of the source. (Note that it could already happen that management lost the race against QEMU and got a completion event instead of cancellation). It is not yet possible to complete the job and switch over to the target disk. The next patches will fix this and add many refinements to the basic idea introduced here. These include improved error management, some tunable knobs and performance optimizations. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NKevin Wolf <kwolf@redhat.com>
-
- 29 9月, 2012 7 次提交
-
-
由 Paolo Bonzini 提交于
This patch adds support for error management to streaming. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Reviewed-by: NEric Blake <eblake@redhat.com> Signed-off-by: NKevin Wolf <kwolf@redhat.com>
-
由 Paolo Bonzini 提交于
The following behaviors are possible: 'report': The behavior is the same as in 1.1. An I/O error, respectively during a read or a write, will complete the job immediately with an error code. 'ignore': An I/O error, respectively during a read or a write, will be ignored. For streaming, the job will complete with an error and the backing file will be left in place. For mirroring, the sector will be marked again as dirty and re-examined later. 'stop': The job will be paused and the job iostatus will be set to failed or nospace, while the VM will keep running. This can only be specified if the block device has rerror=stop and werror=stop or enospc. 'enospc': Behaves as 'stop' for ENOSPC errors, 'report' for others. In all cases, even for 'report', the I/O error is reported as a QMP event BLOCK_JOB_ERROR, with the same arguments as BLOCK_IO_ERROR. It is possible that while stopping the VM a BLOCK_IO_ERROR event will be reported and will clobber the event from BLOCK_JOB_ERROR, or vice versa. This is not really avoidable since stopping the VM completes all pending I/O requests. In fact, it is already possible now that a series of BLOCK_IO_ERROR events are reported with rerror=stop, because vm_stop calls bdrv_drain_all and this can generate further errors. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Reviewed-by: NEric Blake <eblake@redhat.com> Signed-off-by: NKevin Wolf <kwolf@redhat.com>
-
由 Paolo Bonzini 提交于
This will let block-stream reuse the enum. Places that used the enums are renamed accordingly. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Reviewed-by: NEric Blake <eblake@redhat.com> Signed-off-by: NKevin Wolf <kwolf@redhat.com>
-
由 Paolo Bonzini 提交于
We want to remove knowledge of BLOCK_ERR_STOP_ENOSPC from drivers; drivers should only be told whether to stop/report/ignore the error. On the other hand, we want to keep using the nicer BlockErrorAction name in the drivers. So rename the enums, while leaving aside the names of the enum values for now. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Reviewed-by: NEric Blake <eblake@redhat.com> Signed-off-by: NKevin Wolf <kwolf@redhat.com>
-
由 Paolo Bonzini 提交于
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Reviewed-by: NEric Blake <eblake@redhat.com> Signed-off-by: NKevin Wolf <kwolf@redhat.com>
-
由 Paolo Bonzini 提交于
Do this in a separate commit before we move the functions to blockjob.h. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Reviewed-by: NEric Blake <eblake@redhat.com> Signed-off-by: NKevin Wolf <kwolf@redhat.com>
-
由 Jeff Cody 提交于
This adds the live commit coroutine. This iteration focuses on the commit only below the active layer, and not the active layer itself. The behaviour is similar to block streaming; the sectors are walked through, and anything that exists above 'base' is committed back down into base. At the end, intermediate images are deleted, and the chain stitched together. Images are restored to their original open flags upon completion. Signed-off-by: NJeff Cody <jcody@redhat.com> Reviewed-by: NEric Blake <eblake@redhat.com> Signed-off-by: NKevin Wolf <kwolf@redhat.com>
-
- 24 9月, 2012 2 次提交
-
-
由 Jeff Cody 提交于
The keep_read_only flag is no longer used, in favor of the bdrv flag BDRV_O_ALLOW_RDWR. Signed-off-by: NJeff Cody <jcody@redhat.com> Signed-off-by: NKevin Wolf <kwolf@redhat.com>
-
由 Jeff Cody 提交于
This is based on Supriya Kannery's bdrv_reopen() patch series. This provides a transactional method to reopen multiple images files safely. Image files are queue for reopen via bdrv_reopen_queue(), and the reopen occurs when bdrv_reopen_multiple() is called. Changes are staged in bdrv_reopen_prepare() and in the equivalent driver level functions. If any of the staged images fails a prepare, then all of the images left untouched, and the staged changes for each image abandoned. Block drivers are passed a reopen state structure, that contains: * BDS to reopen * flags for the reopen * opaque pointer for any driver-specific data that needs to be persistent from _prepare to _commit/_abort * reopen queue pointer, if the driver needs to queue additional BDS for a reopen Signed-off-by: NJeff Cody <jcody@redhat.com> Signed-off-by: NKevin Wolf <kwolf@redhat.com>
-
- 14 8月, 2012 1 次提交
-
-
由 Luiz Capitulino 提交于
Several block/ files are relying on qerror.h being provided by qapi-types.h. Fix this, as a future commit will change qapi-types.h not to provide qerror.h. Signed-off-by: NLuiz Capitulino <lcapitulino@redhat.com> Reviewed-by: NMarkus Armbruster <armbru@redhat.com>
-
- 07 8月, 2012 1 次提交
-
-
由 Stefan Hajnoczi 提交于
Lazy refcounts is a performance optimization for qcow2 that postpones refcount metadata updates and instead marks the image dirty. In the case of crash or power failure the image will be left in a dirty state and repaired next time it is opened. Reducing metadata I/O is important for cache=writethrough and cache=directsync because these modes guarantee that data is on disk after each write (hence we cannot take advantage of caching updates in RAM). Refcount metadata is not needed for guest->file block address translation and therefore does not need to be on-disk at the time of write completion - this is the motivation behind the lazy refcount optimization. The lazy refcount optimization must be enabled at image creation time: qemu-img create -f qcow2 -o compat=1.1,lazy_refcounts=on a.qcow2 10G qemu-system-x86_64 -drive if=virtio,file=a.qcow2,cache=writethrough Update qemu-iotests 031 and 036 since the extension header size changes when we add feature bit table entries. Signed-off-by: NStefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: NKevin Wolf <kwolf@redhat.com>
-
- 17 7月, 2012 1 次提交
-
-
由 Markus Armbruster 提交于
There are two producers of these hints: drive_init() on behalf of -drive, and hd_geometry_guess(). The only consumer of the hint is hd_geometry_guess(). The callers of hd_geometry_guess() call it only when drive_init() didn't set the hints. Therefore, drive_init()'s hints are never used. Thus, hd_geometry_guess() only ever sees hints it produced itself in a prior call. Only the first call computes something, subsequent calls just repeat the first call's results. However, hd_geometry_guess() is never called more than once: the device models don't, and the block device is destroyed on unplug. Thus, dropping the repeat feature doesn't break anything now. If a block device wasn't destroyed on unplug and could be reused with a new device, then repeating old results would be wrong. Thus, dropping the repeat feature prevents future breakage. This renders the hints unused. Purge them from the block layer. Signed-off-by: NMarkus Armbruster <armbru@redhat.com> Signed-off-by: NKevin Wolf <kwolf@redhat.com>
-
- 15 6月, 2012 1 次提交
-
-
由 Kevin Wolf 提交于
The QED block driver already provides the functionality to not only detect inconsistencies in images, but also fix them. However, this functionality cannot be manually invoked with qemu-img, but the check happens only automatically during bdrv_open(). This adds a -r switch to qemu-img check that allows manual invocation of an image repair. Signed-off-by: NKevin Wolf <kwolf@redhat.com>
-
- 30 5月, 2012 2 次提交
-
-
由 Jim Meyering 提交于
In snapshot mode, bdrv_open creates an empty temporary file without checking for mkstemp or close failure, and ignoring the possibility of a buffer overrun given a surprisingly long $TMPDIR. Change the get_tmp_filename function to return int (not void), so that it can inform its two callers of those failures. Also avoid the risk of buffer overrun and do not ignore mkstemp or close failure. Update both callers (in block.c and vvfat.c) to propagate temp-file-creation failure to their callers. get_tmp_filename creates and closes an empty file, while its callers later open that presumed-existing file with O_CREAT. The problem was that a malicious user could provoke mkstemp failure and race to create a symlink with the selected temporary file name, thus causing the qemu process (usually root owned) to open through the symlink, overwriting an attacker-chosen file. This addresses CVE-2012-2652. http://bugzilla.redhat.com/CVE-2012-2652Signed-off-by: NJim Meyering <meyering@redhat.com> Reviewed-by: NStefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: NKevin Wolf <kwolf@redhat.com>
-
由 Jim Meyering 提交于
In snapshot mode, bdrv_open creates an empty temporary file without checking for mkstemp or close failure, and ignoring the possibility of a buffer overrun given a surprisingly long $TMPDIR. Change the get_tmp_filename function to return int (not void), so that it can inform its two callers of those failures. Also avoid the risk of buffer overrun and do not ignore mkstemp or close failure. Update both callers (in block.c and vvfat.c) to propagate temp-file-creation failure to their callers. get_tmp_filename creates and closes an empty file, while its callers later open that presumed-existing file with O_CREAT. The problem was that a malicious user could provoke mkstemp failure and race to create a symlink with the selected temporary file name, thus causing the qemu process (usually root owned) to open through the symlink, overwriting an attacker-chosen file. This addresses CVE-2012-2652. http://bugzilla.redhat.com/CVE-2012-2652Reviewed-by: NStefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: NJim Meyering <meyering@redhat.com> Signed-off-by: NAnthony Liguori <aliguori@us.ibm.com>
-
- 10 5月, 2012 3 次提交
-
-
由 Paolo Bonzini 提交于
The limitation on not having I/O after cancellation cannot really be kept. Even streaming has a very small race window where you could cancel a job and have it report completion. If this window is hit, bdrv_change_backing_file() will yield and possibly cause accesses to dangling pointers etc. So, let's just assume that we cannot know exactly what will happen after the coroutine has set busy to false. We can set a very lax condition: - if we cancel the job, the coroutine won't set it to false again (and hence will not call co_sleep_ns again). - block_job_cancel_sync will wait for the coroutine to exit, which pretty much ensures no race. Instead, we track the coroutine that executes the job and put very strict conditions on what to do while it is quiescent (busy = false). First of all, the coroutine must never set busy = false while the job has been cancelled. Second, the coroutine can be reentered arbitrarily while it is quiescent, so you cannot really do anything but co_sleep_ns at that time. This condition is obeyed by the block_job_sleep_ns function. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NKevin Wolf <kwolf@redhat.com>
-
由 Paolo Bonzini 提交于
This function abstracts the pretty complex semantics of the "busy" member of BlockJob. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NKevin Wolf <kwolf@redhat.com>
-
由 Paolo Bonzini 提交于
QED's opaque data includes a pointer back to the BlockDriverState. This breaks when bdrv_append shuffles data between bs_new and bs_top. To avoid this, add a "rebind" function that tells the driver about the new relationship between the BlockDriverState and its opaque. The patch also adds rebind to VVFAT for completeness, even though it is not used with live snapshots. Reviewed-by: NStefan Hajnoczi <stefanha@linux.vnet.ibm.com> Reviewed-by: NKevin Wolf <kwolf@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NKevin Wolf <kwolf@redhat.com>
-
- 27 4月, 2012 4 次提交
-
-
由 Stefan Hajnoczi 提交于
Allow streaming operations to be started with an initial speed limit. This eliminates the window of time between starting streaming and issuing block-job-set-speed. Users should use the new optional 'speed' parameter instead so that speed limits are in effect immediately when the job starts. Signed-off-by: NStefan Hajnoczi <stefanha@linux.vnet.ibm.com> Acked-by: NKevin Wolf <kwolf@redhat.com> Signed-off-by: NLuiz Capitulino <lcapitulino@redhat.com>
-
由 Stefan Hajnoczi 提交于
Signed-off-by: NStefan Hajnoczi <stefanha@linux.vnet.ibm.com> Acked-by: NKevin Wolf <kwolf@redhat.com> Signed-off-by: NLuiz Capitulino <lcapitulino@redhat.com>
-
由 Stefan Hajnoczi 提交于
There are at least two different errors that can occur in block_job_set_speed(): the job might not support setting speeds or the value might be invalid. Use the Error mechanism to report the error where it occurs. Signed-off-by: NStefan Hajnoczi <stefanha@linux.vnet.ibm.com> Acked-by: NKevin Wolf <kwolf@redhat.com> Signed-off-by: NLuiz Capitulino <lcapitulino@redhat.com>
-
由 Stefan Hajnoczi 提交于
The block job API uses -errno return values internally and we convert these to Error in the QMP functions. This is ugly because the Error should be created at the point where we still have all the relevant information. More importantly, it is hard to add new error cases to this case since we quickly run out of -errno values without losing information. Go ahead and use Error directly and don't convert later. Signed-off-by: NStefan Hajnoczi <stefanha@linux.vnet.ibm.com> Acked-by: NKevin Wolf <kwolf@redhat.com> Signed-off-by: NLuiz Capitulino <lcapitulino@redhat.com>
-
- 20 4月, 2012 1 次提交
-
-
由 Kevin Wolf 提交于
This adds the basic infrastructure to qcow2 to handle version 3 images. It includes code to create v3 images, allow header updates for v3 images and checks feature bits. It still misses support for zero clusters, so this is not a fully compliant implementation of v3 yet. The default for creating new images stays at v2 for now. Signed-off-by: NKevin Wolf <kwolf@redhat.com>
-
- 05 4月, 2012 3 次提交
-
-
由 Paolo Bonzini 提交于
I am not sure that these are really proper GtkDoc, but they follow the existing documentation in block_int.h. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Reviewed-by: NStefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: NKevin Wolf <kwolf@redhat.com>
-
由 Paolo Bonzini 提交于
Streaming can issue I/O while qcow2_close is running. This causes the L2 caches to become very confused or, alternatively, could cause a segfault when the streaming coroutine is reentered after closing its block device. The fix is to cancel streaming jobs when closing their underlying device. The cancellation must be synchronous, on the other hand qemu_aio_wait will not restart a coroutine that is sleeping in co_sleep. So add a flag saying whether streaming has in-flight I/O. If the busy flag is false, the coroutine is quiescent and, when cancelled, will not issue any new I/O. This protects streaming against closing, but not against deleting. We have a reference count protecting us against concurrent deletion, but I still added an assertion to ensure nothing bad happens. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Reviewed-by: NStefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: NKevin Wolf <kwolf@redhat.com>
-
由 Paolo Bonzini 提交于
And remove several block_int.h inclusions that should not be there. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Reviewed-by: NStefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: NKevin Wolf <kwolf@redhat.com>
-
- 29 2月, 2012 3 次提交
-
-
由 Jeff Cody 提交于
This is a QAPI/QMP only command to take a snapshot of a group of devices. This is similar to the blockdev-snapshot-sync command, except blockdev-group-snapshot-sync accepts a list devices, filenames, and formats. It is attempted to keep the snapshot of the group atomic; if the creation or open of any of the new snapshots fails, then all of the new snapshots are abandoned, and the name of the snapshot image that failed is returned. The failure case should not interrupt any operations. Rather than use bdrv_close() along with a subsequent bdrv_open() to perform the pivot, the original image is never closed and the new image is placed 'in front' of the original image via manipulation of the BlockDriverState fields. Thus, once the new snapshot image has been successfully created, there are no more failure points before pivoting to the new snapshot. This allows the group of disks to remain consistent with each other, even across snapshot failures. Signed-off-by: NJeff Cody <jcody@redhat.com> Acked-by: NLuiz Capitulino <lcapitulino@redhat.com> Signed-off-by: NKevin Wolf <kwolf@redhat.com>
-
由 Paolo Bonzini 提交于
These were never used. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NKevin Wolf <kwolf@redhat.com>
-
由 Paolo Bonzini 提交于
sync_aiocb is unused since commit ce1a14dc (Dynamically allocate AIO Completion Blocks., 2006-08-07). private is unused since commit 56a14938 (drive cleanup fixes., 2009-09-25). Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NKevin Wolf <kwolf@redhat.com>
-
- 23 2月, 2012 1 次提交
-
-
由 Luiz Capitulino 提交于
Signed-off-by: NLuiz Capitulino <lcapitulino@redhat.com> Reviewed-by: NMarkus Armbruster <armbru@redhat.com> Acked-by: NKevin Wolf <kwolf@redhat.com>
-
- 09 2月, 2012 1 次提交
-
-
由 Stefan Hajnoczi 提交于
The ability to zero regions of an image file is a useful primitive for higher-level features such as image streaming or zero write detection. Image formats may support an optimized metadata representation instead of writing zeroes into the image file. This allows zero writes to be potentially faster than regular write operations and also preserve sparseness of the image file. The .bdrv_co_write_zeroes() interface should be implemented by block drivers that wish to provide efficient zeroing. Note that this operation is different from the discard operation, which may leave the contents of the region indeterminate. That means discarded blocks are not guaranteed to contain zeroes and may contain junk data instead. Signed-off-by: NStefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: NKevin Wolf <kwolf@redhat.com>
-
- 26 1月, 2012 4 次提交
-
-
由 Marcelo Tosatti 提交于
Add support for streaming data from an intermediate section of the image chain (see patch and documentation for details). Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com> Signed-off-by: NStefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: NKevin Wolf <kwolf@redhat.com>
-
由 Stefan Hajnoczi 提交于
Signed-off-by: NStefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: NKevin Wolf <kwolf@redhat.com>
-
由 Stefan Hajnoczi 提交于
Signed-off-by: NStefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: NKevin Wolf <kwolf@redhat.com>
-
由 Stefan Hajnoczi 提交于
Previously copy-on-read could only be enabled for all requests to a block device. This means requests coming from the guest as well as QEMU's internal requests would perform copy-on-read when enabled. For image streaming we want to support finer-grained behavior than just populating the image file from its backing image. Image streaming supports partial streaming where a common backing image is preserved. In this case guest requests should not perform copy-on-read because they would indiscriminately copy data which should be left in a backing image from the backing chain. Introduce a per-request flag for copy-on-read so that a block device can process both regular and copy-on-read requests. Overlapping reads and writes still need to be serialized for correctness when copy-on-read is happening, so add an in-flight reference count to track this. Signed-off-by: NStefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: NKevin Wolf <kwolf@redhat.com>
-
- 05 12月, 2011 3 次提交
-
-
由 Stefan Hajnoczi 提交于
The bdrv_enable_copy_on_read()/bdrv_disable_copy_on_read() functions can be used to programmatically enable or disable copy-on-read for a block device. Later patches add the actual copy-on-read logic. Signed-off-by: NStefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: NKevin Wolf <kwolf@redhat.com>
-
由 Stefan Hajnoczi 提交于
The block layer does not know about pending requests. This information is necessary for copy-on-read since overlapping requests must be serialized to prevent races that corrupt the image. The BlockDriverState gets a new tracked_request list field which contains all pending requests. Each request is a BdrvTrackedRequest record with sector_num, nb_sectors, and is_write fields. Note that request tracking is always enabled but hopefully this extra work is so small that it doesn't justify adding an enable/disable flag. Signed-off-by: NStefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: NKevin Wolf <kwolf@redhat.com>
-
由 Stefan Hajnoczi 提交于
Now that all block drivers have been converted to .bdrv_co_is_allocated() we can drop .bdrv_is_allocated(). Note that the public bdrv_is_allocated() interface is still available but is in fact a synchronous wrapper around .bdrv_co_is_allocated(). Signed-off-by: NStefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: NKevin Wolf <kwolf@redhat.com>
-