- 20 1月, 2016 3 次提交
-
-
由 Kevin Wolf 提交于
Instead of covering only the state of images on the migration destination before the migration is completed, the flag will also cover the state of images on the migration source after completion. This common state implies that the image is technically still open, but no writes will happen and any cached contents will be reloaded from disk if and when the image leaves this state. Signed-off-by: NKevin Wolf <kwolf@redhat.com> Reviewed-by: NEric Blake <eblake@redhat.com>
-
由 Kevin Wolf 提交于
As long as BDRV_O_INCOMING is set, the image file is only opened so we have a file descriptor for it. We're definitely not supposed to modify the image, it's still owned by the migration source. Signed-off-by: NKevin Wolf <kwolf@redhat.com> Reviewed-by: NEric Blake <eblake@redhat.com>
-
由 Peter Maydell 提交于
Clean up includes so that osdep.h is included first and headers which it implies are not included manually. This commit was created with scripts/clean-includes. Signed-off-by: NPeter Maydell <peter.maydell@linaro.org> Reviewed-by: NEric Blake <eblake@redhat.com> Signed-off-by: NKevin Wolf <kwolf@redhat.com>
-
- 22 12月, 2015 2 次提交
-
-
由 Stefan Hajnoczi 提交于
Request merging must not result in a huge request that exceeds the maximum number of iovec elements. Use BlockLimits.max_iov instead of hardcoding IOV_MAX. Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
-
由 Stefan Hajnoczi 提交于
The maximum number of struct iovec elements depends on the BlockDriverState. The raw-posix and iSCSI protocols have a maximum of IOV_MAX but others could have different values. Cc: Peter Lieven <pl@kamp.de> Suggested-by: NKevin Wolf <kwolf@redhat.com> Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
-
- 18 12月, 2015 1 次提交
-
-
由 Paolo Bonzini 提交于
When called from a coroutine, bdrv_ioctl must be asynchronous just like e.g. bdrv_flush. The code was incorrectly making it synchronous, fix it. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Reviewed-by: NFam Zheng <famz@redhat.com> Signed-off-by: NKevin Wolf <kwolf@redhat.com>
-
- 03 12月, 2015 1 次提交
-
-
由 Fam Zheng 提交于
The assertion problem was noticed in 06c3916b, but it wasn't completely fixed, because even though the req is not marked as serialising, it still gets serialised by wait_serialising_requests against other serialising requests, which could lead to the same assertion failure. Fix it by even more explicitly skipping the serialising for this specific case. Signed-off-by: NFam Zheng <famz@redhat.com> Message-id: 1448962590-2842-2-git-send-email-famz@redhat.com Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
-
- 12 11月, 2015 5 次提交
-
-
由 Fam Zheng 提交于
Drivers can have internal request sources that generate IO, like the need_check_timer in QED. Since we want quiesced periods that contain nested event loops in block layer, we need to have a way to disable such event sources. Block drivers must implement the "bdrv_drain" callback if it has any internal sources that can generate I/O activity, like a timer or a worker thread (even in a library) that can schedule QEMUBH in an asynchronous callback. Update the comments of bdrv_drain and bdrv_drained_begin accordingly. Like bdrv_requests_pending(), we should consider all the children of bs. Before, the while loop just works, as bdrv_requests_pending() already tracks its children; now we mustn't miss the callback, so recurse down explicitly. Signed-off-by: NFam Zheng <famz@redhat.com> Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com> Message-id: 1447064214-29930-9-git-send-email-famz@redhat.com Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com> Signed-off-by: NKevin Wolf <kwolf@redhat.com>
-
由 Fam Zheng 提交于
Currently all drivers that support .bdrv_aio_ioctl also implement .bdrv_ioctl redundantly. To track ioctl requests in block layer it is easier if we unify the two paths, because we'll need to run it in a coroutine, as required by tracked_request_begin. While we're at it, use .bdrv_aio_ioctl plus aio_poll() to emulate bdrv_ioctl(). Signed-off-by: NFam Zheng <famz@redhat.com> Message-id: 1447064214-29930-7-git-send-email-famz@redhat.com Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com> Signed-off-by: NKevin Wolf <kwolf@redhat.com>
-
由 Fam Zheng 提交于
Both bdrv_discard and bdrv_aio_discard will call into bdrv_co_discard, so add tracked_request_begin/end calls around the loop. Signed-off-by: NFam Zheng <famz@redhat.com> Reviewed-by: NKevin Wolf <kwolf@redhat.com> Message-id: 1447064214-29930-4-git-send-email-famz@redhat.com Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com> Signed-off-by: NKevin Wolf <kwolf@redhat.com>
-
由 Fam Zheng 提交于
Both bdrv_flush and bdrv_aio_flush eventually call bdrv_co_flush, add tracked_request_begin and tracked_request_end pair in that function so that all flush requests are now tracked. Signed-off-by: NFam Zheng <famz@redhat.com> Reviewed-by: NKevin Wolf <kwolf@redhat.com> Message-id: 1447064214-29930-3-git-send-email-famz@redhat.com Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com> Signed-off-by: NKevin Wolf <kwolf@redhat.com>
-
由 Fam Zheng 提交于
We'll track more request types besides read and write, change the boolean field to an enum. Signed-off-by: NFam Zheng <famz@redhat.com> Reviewed-by: NKevin Wolf <kwolf@redhat.com> Message-id: 1447064214-29930-2-git-send-email-famz@redhat.com Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com> Signed-off-by: NKevin Wolf <kwolf@redhat.com>
-
- 30 10月, 2015 1 次提交
-
-
由 Kevin Wolf 提交于
The function manually recursed into bs->file and bs->backing to check whether there were any requests pending, but it ignored other children. There's no need to special case file and backing here, so just replace these two explicit recursions by a loop recursing for all child nodes. Reported-by: NMax Reitz <mreitz@redhat.com> Signed-off-by: NKevin Wolf <kwolf@redhat.com> Reviewed-by: NAlberto Garcia <berto@igalia.com> Reviewed-by: NJeff Cody <jcody@redhat.com> Message-id: 1446029211-27148-1-git-send-email-kwolf@redhat.com Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
-
- 24 10月, 2015 3 次提交
-
-
由 Fam Zheng 提交于
The semantics is that after bdrv_drained_begin(bs), bs will not get new external requests until the matching bdrv_drained_end(bs). Signed-off-by: NFam Zheng <famz@redhat.com> Signed-off-by: NKevin Wolf <kwolf@redhat.com>
-
由 Max Reitz 提交于
As the comment above bdrv_get_stats() says, BlockAcctStats is something which belongs to the device instead of each BlockDriverState. This patch therefore moves it into the BlockBackend. Signed-off-by: NMax Reitz <mreitz@redhat.com> Reviewed-by: NEric Blake <eblake@redhat.com> Reviewed-by: NAlberto Garcia <berto@igalia.com> Reviewed-by: NKevin Wolf <kwolf@redhat.com> Signed-off-by: NKevin Wolf <kwolf@redhat.com>
-
由 Max Reitz 提交于
BlockAcctStats contains statistics about the data transferred from and to the device; wr_highest_sector does not fit in with the rest. Furthermore, those statistics are supposed to be specific for a certain device and not necessarily for a BDS (see the comment above bdrv_get_stats()); on the other hand, wr_highest_sector may be a rather important information to know for each BDS. When BlockAcctStats is finally removed from the BDS, we will want to keep wr_highest_sector in the BDS. Finally, wr_highest_sector is renamed to wr_highest_offset and given the appropriate meaning. Externally, it is represented as an offset so there is no point in doing something different internally. Its definition is changed to match that in qapi/block-core.json which is "the offset after the greatest byte written to". Doing so should not cause any harm since if external programs tried to calculate the volume usage by (wr_highest_offset + 512) / volume_size, after this patch they will just assume the volume to be full slightly earlier than before. Signed-off-by: NMax Reitz <mreitz@redhat.com> Reviewed-by: NEric Blake <eblake@redhat.com> Reviewed-by: NAlberto Garcia <berto@igalia.com> Reviewed-by: NKevin Wolf <kwolf@redhat.com> Signed-off-by: NKevin Wolf <kwolf@redhat.com>
-
- 16 10月, 2015 3 次提交
-
-
由 Kevin Wolf 提交于
Signed-off-by: NKevin Wolf <kwolf@redhat.com> Reviewed-by: NMax Reitz <mreitz@redhat.com> Reviewed-by: NAlberto Garcia <berto@igalia.com> Reviewed-by: NFam Zheng <famz@redhat.com> Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
-
由 Kevin Wolf 提交于
This is the final step in converting all of the BlockDriverState pointers that block drivers use to BdrvChild. After this patch, bs->children contains the full list of child nodes that are referenced by a given BDS, and these children are only referenced through BdrvChild, so that updating the pointer in there is enough for changing edges in the graph. Signed-off-by: NKevin Wolf <kwolf@redhat.com> Reviewed-by: NAlberto Garcia <berto@igalia.com> Reviewed-by: NMax Reitz <mreitz@redhat.com> Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
-
由 Kevin Wolf 提交于
This patch removes the temporary duplication between bs->file and bs->file_child by converting everything to BdrvChild. Signed-off-by: NKevin Wolf <kwolf@redhat.com> Reviewed-by: NMax Reitz <mreitz@redhat.com> Reviewed-by: NAlberto Garcia <berto@igalia.com> Reviewed-by: NFam Zheng <famz@redhat.com> Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
-
- 12 10月, 2015 1 次提交
-
-
由 Paolo Bonzini 提交于
Simplify memory allocation by sticking with a single API. GSlice is not that fast anyway (tcmalloc/jemalloc are better). Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
-
- 25 9月, 2015 1 次提交
-
-
由 Wen Congyang 提交于
In some cases, we need to disable copy-on-read, and just read the data. Signed-off-by: NWen Congyang <wency@cn.fujitsu.com> Message-id: 1441682913-14320-2-git-send-email-wency@cn.fujitsu.com Signed-off-by: NJeff Cody <jcody@redhat.com>
-
- 07 7月, 2015 1 次提交
-
-
由 Stefan Hajnoczi 提交于
The doc comments for bdrv_drain_all() and bdrv_drain() are outdated: * The bdrv_drain() comment is a poor man's bdrv_lock()/bdrv_unlock() which Fam Zheng is currently developing. Unfortunately this warning was never really enough because devices keep submitting I/O and op blockers don't prevent that. * The bdrv_drain_all() comment is still partially correct but reflects the nature of the implementation rather than API documentation. Do make it clear that bdrv_drain() is only appropriate within an AioContext. For anything spanning AioContexts you need bdrv_drain_all(). Cc: Markus Armbruster <armbru@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com> Reviewed-by: NMarkus Armbruster <armbru@redhat.com> Reviewed-by: NFam Zheng <famz@redhat.com> Message-id: 1435854281-6078-1-git-send-email-stefanha@redhat.com
-
- 02 7月, 2015 3 次提交
-
-
由 Alberto Garcia 提交于
An empty GSList is represented by a NULL pointer, therefore it's a perfectly valid argument for g_slist_find() and there's no need to make any additional check. Signed-off-by: NAlberto Garcia <berto@igalia.com> Message-id: 1435583533-5758-1-git-send-email-berto@igalia.com Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
-
由 Fam Zheng 提交于
Unsetting dirty globally with discard is not very correct. The discard may zero out sectors (depending on can_write_zeroes_with_unmap), we should replicate this change to destination side to make sure that the guest sees the same data. Calling bdrv_reset_dirty also troubles mirror job because the hbitmap iterator doesn't expect unsetting of bits after current position. So let's do it the opposite way which fixes both problems: set the dirty bits if we are to discard it. Reported-by: wangxiaolong@ucloud.cn Signed-off-by: NFam Zheng <famz@redhat.com> Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com> Reviewed-by: NEric Blake <eblake@redhat.com> Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
-
由 Fam Zheng 提交于
Like bdrv_is_allocated_above, this function follows the backing chain until seeing BDRV_BLOCK_ALLOCATED. Base is not included. Reimplement bdrv_is_allocated on top. [Initialized bdrv_co_get_block_status_above() ret to 0 to silence mingw64 compiler warning about the unitialized variable. assert(bs != base) prevents that case but I suppose the program could be compiled with -DNDEBUG. --Stefan] Signed-off-by: NFam Zheng <famz@redhat.com> Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
-
- 23 6月, 2015 3 次提交
-
-
由 Dimitris Aragiorgis 提交于
During migration, QEMU uses fsync()/fdatasync() on the open file descriptor for read-write block devices to flush data just before stopping the VM. However, fsync() on a scsi-generic device returns -EINVAL which causes the migration to fail. This patch skips flushing data in case of an SG device, since submitting SCSI commands directly via an SG character device (e.g. /dev/sg0) bypasses the page cache completely, anyway. Note that fsync() not only flushes the page cache but also the disk cache. The scsi-generic device never sends flushes, and for migration it assumes that the same SCSI device is used by the destination host, so it does not issue any SCSI SYNCHRONIZE CACHE (10) command. Finally, remove the bdrv_is_sg() test from iscsi_co_flush() since this is now redundant (we flush the underlying protocol at the end of bdrv_co_flush() which, with this patch, we never reach). Signed-off-by: NDimitris Aragiorgis <dimara@arrikto.com> Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com> Message-id: 1435056300-14924-3-git-send-email-dimara@arrikto.com Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
-
由 Alexander Yarygin 提交于
After the commit 9b536adc ("block: acquire AioContext in bdrv_drain_all()") the aio_poll() function got called for every BlockDriverState, in assumption that every device may have its own AioContext. If we have thousands of disks attached, there are a lot of BlockDriverStates but only a few AioContexts, leading to tons of unnecessary aio_poll() calls. This patch changes the bdrv_drain_all() function allowing it find shared AioContexts and to call aio_poll() only for unique ones. Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Cornelia Huck <cornelia.huck@de.ibm.com> Cc: Kevin Wolf <kwolf@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: NAlexander Yarygin <yarygin@linux.vnet.ibm.com> Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com> Tested-by: NChristian Borntraeger <borntraeger@de.ibm.com> Message-id: 1433936297-7098-4-git-send-email-yarygin@linux.vnet.ibm.com Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
-
由 Markus Armbruster 提交于
Signed-off-by: NMarkus Armbruster <armbru@redhat.com> Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com> Reviewed-by: NEric Blake <eblake@redhat.com> Reviewed-by: NLuiz Capitulino <lcapitulino@redhat.com>
-
- 12 6月, 2015 2 次提交
-
-
由 Alberto Garcia 提交于
The throttle group support use a cooperative round robin scheduling algorithm. The principles of the algorithm are simple: - Each BDS of the group is used as a token in a circular way. - The active BDS computes if a wait must be done and arms the right timer. - If a wait must be done the token timer will be armed so the token will become the next active BDS. Signed-off-by: NAlberto Garcia <berto@igalia.com> Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com> Message-id: f0082a86f3ac01c46170f7eafe2101a92e8fde39.1433779731.git.berto@igalia.com Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
-
由 Benoît Canet 提交于
Group throttling will share ThrottleState between multiple bs. As a consequence the ThrottleState will be accessed by multiple aio context. Timers are tied to their aio context so they must go out of the ThrottleState structure. This commit paves the way for each bs of a common ThrottleState to have its own timer. Signed-off-by: NBenoit Canet <benoit.canet@nodalink.com> Signed-off-by: NAlberto Garcia <berto@igalia.com> Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com> Message-id: 6cf9ea96d8b32ae2f8769cead38f68a6a0c8c909.1433779731.git.berto@igalia.com Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
-
- 22 5月, 2015 6 次提交
-
-
由 Paolo Bonzini 提交于
A bit of Boolean algebra (and common sense) tells us that the second "if" here is looking for blocks that are not allocated. This is the opposite of the "if" that sets BDRV_BLOCK_ALLOCATED, and thus it can use an "else". Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Reviewed-by: NEric Blake <eblake@redhat.com> Reviewed-by: NFam Zheng <famz@redhat.com> Message-id: 1431599702-10431-1-git-send-email-pbonzini@redhat.com Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
-
由 Fam Zheng 提交于
For zero write, callers pass in NULL qiov (qemu-io "write -z" or scsi-disk "write same"). Commit fc3959e4 fixed bdrv_co_write_zeroes which is the common case for this bug, but it still exists in bdrv_aio_write_zeroes. A simpler fix would be in bdrv_co_do_pwritev which is the NULL dereference point and covers both cases. So don't access it in bdrv_co_do_pwritev in this case, use three aligned writes. [Initialize ret to 0 in bdrv_co_do_zero_pwritev() to avoid uninitialized variable warning with gcc 4.9.2. --Stefan] Signed-off-by: NFam Zheng <famz@redhat.com> Message-id: 1431522721-3266-3-git-send-email-famz@redhat.com Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
-
由 Fam Zheng 提交于
This reverts commit fc3959e4. The core write code already handles the case, so remove this duplication. Because commit 61007b31 moved the touched code from block.c to block/io.c, the change is manually reverted. Signed-off-by: NFam Zheng <famz@redhat.com> Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com> Reviewed-by: NKevin Wolf <kwolf@redhat.com> Message-id: 1431522721-3266-2-git-send-email-famz@redhat.com Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
-
由 Denis V. Lunev 提交于
The following sequence int fd = open(argv[1], O_RDWR | O_CREAT | O_DIRECT, 0644); for (i = 0; i < 100000; i++) write(fd, buf, 4096); performs 5% better if buf is aligned to 4096 bytes. The difference is quite reliable. On the other hand we do not want at the moment to enforce bounce buffering if guest request is aligned to 512 bytes. The patch changes default bounce buffer optimal alignment to MAX(page size, 4k). 4k is chosen as maximal known sector size on real HDD. The justification of the performance improve is quite interesting. From the kernel point of view each request to the disk was split by two. This could be seen by blktrace like this: 9,0 11 1 0.000000000 11151 Q WS 312737792 + 1023 [qemu-img] 9,0 11 2 0.000007938 11151 Q WS 312738815 + 8 [qemu-img] 9,0 11 3 0.000030735 11151 Q WS 312738823 + 1016 [qemu-img] 9,0 11 4 0.000032482 11151 Q WS 312739839 + 8 [qemu-img] 9,0 11 5 0.000041379 11151 Q WS 312739847 + 1016 [qemu-img] 9,0 11 6 0.000042818 11151 Q WS 312740863 + 8 [qemu-img] 9,0 11 7 0.000051236 11151 Q WS 312740871 + 1017 [qemu-img] 9,0 5 1 0.169071519 11151 Q WS 312741888 + 1023 [qemu-img] After the patch the pattern becomes normal: 9,0 6 1 0.000000000 12422 Q WS 314834944 + 1024 [qemu-img] 9,0 6 2 0.000038527 12422 Q WS 314835968 + 1024 [qemu-img] 9,0 6 3 0.000072849 12422 Q WS 314836992 + 1024 [qemu-img] 9,0 6 4 0.000106276 12422 Q WS 314838016 + 1024 [qemu-img] and the amount of requests sent to disk (could be calculated counting number of lines in the output of blktrace) is reduced about 2 times. Both qemu-img and qemu-io are affected while qemu-kvm is not. The guest does his job well and real requests comes properly aligned (to page). Signed-off-by: NDenis V. Lunev <den@openvz.org> Reviewed-by: NKevin Wolf <kwolf@redhat.com> Message-id: 1431441056-26198-3-git-send-email-den@openvz.org CC: Paolo Bonzini <pbonzini@redhat.com> CC: Kevin Wolf <kwolf@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
-
由 Denis V. Lunev 提交于
The patch introduces new concept: minimal memory alignment for bounce buffers. Original so called "optimal" value is actually minimal required value for aligment. It should be used for validation that the IOVec is properly aligned and bounce buffer is not required. Though, from the performance point of view, it would be better if bounce buffer or IOVec allocated by QEMU will be aligned stricter. The patch does not change any alignment value yet. Signed-off-by: NDenis V. Lunev <den@openvz.org> Reviewed-by: NKevin Wolf <kwolf@redhat.com> Message-id: 1431441056-26198-2-git-send-email-den@openvz.org CC: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: NKevin Wolf <kwolf@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
-
由 Paolo Bonzini 提交于
This is the behavior in the operating system, for example Linux's blkdev_write_iter has the following: if (bdev_read_only(I_BDEV(bd_inode))) return -EPERM; This does not apply to opening a device for read/write, when the device only supports read-only operation. In this case any of EACCES, EPERM or EROFS is acceptable depending on why writing is not possible. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Message-id: 1431013548-22492-1-git-send-email-pbonzini@redhat.com Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
-
- 28 4月, 2015 1 次提交
-
-
由 Stefan Hajnoczi 提交于
The block.c file has grown to over 6000 lines. It is time to split this file so there are fewer conflicts and the code is easier to maintain. Extract I/O request processing code: * Read * Write * Zero writes and making the image empty * Flush * Discard * ioctl * Tracked requests and queuing * Throttling and copy-on-read * Block status and allocated functions * Refreshing block limits * Reading/writing vmstate * qemu_blockalign() and friends The patch simply moves code from block.c into block/io.c. Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com> Signed-off-by: NKevin Wolf <kwolf@redhat.com>
-