提交 · 1963f8d52e04a8f8b213e34e6a76fb286fb23ec1 · openeuler / qemu

20 1月, 2016 3 次提交

block: Rename BDRV_O_INCOMING to BDRV_O_INACTIVE · 04c01a5c

由 Kevin Wolf 提交于 1月 13, 2016

Instead of covering only the state of images on the migration
destination before the migration is completed, the flag will also cover
the state of images on the migration source after completion. This
common state implies that the image is technically still open, but no
writes will happen and any cached contents will be reloaded from disk if
and when the image leaves this state.
Signed-off-by: NKevin Wolf <kwolf@redhat.com>
Reviewed-by: NEric Blake <eblake@redhat.com>

04c01a5c

block: Assert no write requests under BDRV_O_INCOMING · 09e0c771

由 Kevin Wolf 提交于 12月 16, 2015

As long as BDRV_O_INCOMING is set, the image file is only opened so we
have a file descriptor for it. We're definitely not supposed to modify
the image, it's still owned by the migration source.
Signed-off-by: NKevin Wolf <kwolf@redhat.com>
Reviewed-by: NEric Blake <eblake@redhat.com>

09e0c771

block: Clean up includes · 80c71a24

由 Peter Maydell 提交于 1月 18, 2016

Clean up includes so that osdep.h is included first and headers
which it implies are not included manually.

This commit was created with scripts/clean-includes.
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
Reviewed-by: NEric Blake <eblake@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

80c71a24

22 12月, 2015 2 次提交

block: replace IOV_MAX with BlockLimits.max_iov · 222565f6

由 Stefan Hajnoczi 提交于 7月 09, 2015

Request merging must not result in a huge request that exceeds the
maximum number of iovec elements.  Use BlockLimits.max_iov instead of
hardcoding IOV_MAX.
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>

222565f6

block: add BlockLimits.max_iov field · bd44feb7

由 Stefan Hajnoczi 提交于 7月 09, 2015

The maximum number of struct iovec elements depends on the
BlockDriverState.  The raw-posix and iSCSI protocols have a maximum of
IOV_MAX but others could have different values.

Cc: Peter Lieven <pl@kamp.de>
Suggested-by: NKevin Wolf <kwolf@redhat.com>
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>

bd44feb7

18 12月, 2015 1 次提交

block: fix bdrv_ioctl called from coroutine · ba889444

由 Paolo Bonzini 提交于 12月 16, 2015

When called from a coroutine, bdrv_ioctl must be asynchronous just like
e.g. bdrv_flush.  The code was incorrectly making it synchronous, fix
it.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Reviewed-by: NFam Zheng <famz@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

ba889444

03 12月, 2015 1 次提交

block: Don't wait serialising for non-COR read requests · 61408b25

由 Fam Zheng 提交于 12月 01, 2015

The assertion problem was noticed in 06c3916b, but it wasn't
completely fixed, because even though the req is not marked as
serialising, it still gets serialised by wait_serialising_requests
against other serialising requests, which could lead to the same
assertion failure.

Fix it by even more explicitly skipping the serialising for this
specific case.
Signed-off-by: NFam Zheng <famz@redhat.com>
Message-id: 1448962590-2842-2-git-send-email-famz@redhat.com
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>

61408b25

12 11月, 2015 5 次提交

block: Introduce BlockDriver.bdrv_drain callback · 67da1dc5

由 Fam Zheng 提交于 11月 09, 2015

Drivers can have internal request sources that generate IO, like the
need_check_timer in QED. Since we want quiesced periods that contain
nested event loops in block layer, we need to have a way to disable such
event sources.

Block drivers must implement the "bdrv_drain" callback if it has any
internal sources that can generate I/O activity, like a timer or a
worker thread (even in a library) that can schedule QEMUBH in an
asynchronous callback.

Update the comments of bdrv_drain and bdrv_drained_begin accordingly.

Like bdrv_requests_pending(), we should consider all the children of bs.
Before, the while loop just works, as bdrv_requests_pending() already
tracks its children; now we mustn't miss the callback, so recurse down
explicitly.
Signed-off-by: NFam Zheng <famz@redhat.com>
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Message-id: 1447064214-29930-9-git-send-email-famz@redhat.com
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

67da1dc5

block: Emulate bdrv_ioctl with bdrv_aio_ioctl and track both · 5c5ae76a

由 Fam Zheng 提交于 11月 09, 2015

Currently all drivers that support .bdrv_aio_ioctl also implement
.bdrv_ioctl redundantly.  To track ioctl requests in block layer it is
easier if we unify the two paths, because we'll need to run it in a
coroutine, as required by tracked_request_begin. While we're at it, use
.bdrv_aio_ioctl plus aio_poll() to emulate bdrv_ioctl().
Signed-off-by: NFam Zheng <famz@redhat.com>
Message-id: 1447064214-29930-7-git-send-email-famz@redhat.com
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

5c5ae76a

block: Track discard requests · b1066c87

由 Fam Zheng 提交于 11月 09, 2015

Both bdrv_discard and bdrv_aio_discard will call into bdrv_co_discard,
so add tracked_request_begin/end calls around the loop.
Signed-off-by: NFam Zheng <famz@redhat.com>
Reviewed-by: NKevin Wolf <kwolf@redhat.com>
Message-id: 1447064214-29930-4-git-send-email-famz@redhat.com
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

b1066c87

block: Track flush requests · cdb5e315

由 Fam Zheng 提交于 11月 09, 2015

Both bdrv_flush and bdrv_aio_flush eventually call bdrv_co_flush, add
tracked_request_begin and tracked_request_end pair in that function so
that all flush requests are now tracked.
Signed-off-by: NFam Zheng <famz@redhat.com>
Reviewed-by: NKevin Wolf <kwolf@redhat.com>
Message-id: 1447064214-29930-3-git-send-email-famz@redhat.com
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

cdb5e315

block: Add more types for tracked request · ebde595c

由 Fam Zheng 提交于 11月 09, 2015

We'll track more request types besides read and write, change the
boolean field to an enum.
Signed-off-by: NFam Zheng <famz@redhat.com>
Reviewed-by: NKevin Wolf <kwolf@redhat.com>
Message-id: 1447064214-29930-2-git-send-email-famz@redhat.com
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

ebde595c

30 10月, 2015 1 次提交

block: Consider all child nodes in bdrv_requests_pending() · 37a639a7

由 Kevin Wolf 提交于 10月 28, 2015

The function manually recursed into bs->file and bs->backing to check
whether there were any requests pending, but it ignored other children.

There's no need to special case file and backing here, so just replace
these two explicit recursions by a loop recursing for all child nodes.
Reported-by: NMax Reitz <mreitz@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>
Reviewed-by: NAlberto Garcia <berto@igalia.com>
Reviewed-by: NJeff Cody <jcody@redhat.com>
Message-id: 1446029211-27148-1-git-send-email-kwolf@redhat.com
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>

37a639a7

24 10月, 2015 3 次提交

block: Introduce "drained begin/end" API · 51288d79

由 Fam Zheng 提交于 10月 23, 2015

The semantics is that after bdrv_drained_begin(bs), bs will not get new external
requests until the matching bdrv_drained_end(bs).
Signed-off-by: NFam Zheng <famz@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

51288d79

block: Move BlockAcctStats into BlockBackend · 7f0e9da6

由 Max Reitz 提交于 10月 19, 2015

As the comment above bdrv_get_stats() says, BlockAcctStats is something
which belongs to the device instead of each BlockDriverState. This patch
therefore moves it into the BlockBackend.
Signed-off-by: NMax Reitz <mreitz@redhat.com>
Reviewed-by: NEric Blake <eblake@redhat.com>
Reviewed-by: NAlberto Garcia <berto@igalia.com>
Reviewed-by: NKevin Wolf <kwolf@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

7f0e9da6

block: Remove wr_highest_sector from BlockAcctStats · 53d8f9d8

由 Max Reitz 提交于 10月 19, 2015

BlockAcctStats contains statistics about the data transferred from and
to the device; wr_highest_sector does not fit in with the rest.

Furthermore, those statistics are supposed to be specific for a certain
device and not necessarily for a BDS (see the comment above
bdrv_get_stats()); on the other hand, wr_highest_sector may be a rather
important information to know for each BDS. When BlockAcctStats is
finally removed from the BDS, we will want to keep wr_highest_sector in
the BDS.

Finally, wr_highest_sector is renamed to wr_highest_offset and given the
appropriate meaning. Externally, it is represented as an offset so there
is no point in doing something different internally. Its definition is
changed to match that in qapi/block-core.json which is "the offset after
the greatest byte written to". Doing so should not cause any harm since
if external programs tried to calculate the volume usage by
(wr_highest_offset + 512) / volume_size, after this patch they will just
assume the volume to be full slightly earlier than before.
Signed-off-by: NMax Reitz <mreitz@redhat.com>
Reviewed-by: NEric Blake <eblake@redhat.com>
Reviewed-by: NAlberto Garcia <berto@igalia.com>
Reviewed-by: NKevin Wolf <kwolf@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

53d8f9d8

16 10月, 2015 3 次提交

block/io: Make bdrv_requests_pending() public · 439db28c

由 Kevin Wolf 提交于 9月 16, 2015

Signed-off-by: NKevin Wolf <kwolf@redhat.com>
Reviewed-by: NMax Reitz <mreitz@redhat.com>
Reviewed-by: NAlberto Garcia <berto@igalia.com>
Reviewed-by: NFam Zheng <famz@redhat.com>
Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>

439db28c

block: Convert bs->backing_hd to BdrvChild · 760e0063

由 Kevin Wolf 提交于 6月 17, 2015

This is the final step in converting all of the BlockDriverState
pointers that block drivers use to BdrvChild.

After this patch, bs->children contains the full list of child nodes
that are referenced by a given BDS, and these children are only
referenced through BdrvChild, so that updating the pointer in there is
enough for changing edges in the graph.
Signed-off-by: NKevin Wolf <kwolf@redhat.com>
Reviewed-by: NAlberto Garcia <berto@igalia.com>
Reviewed-by: NMax Reitz <mreitz@redhat.com>
Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>

760e0063

block: Convert bs->file to BdrvChild · 9a4f4c31

由 Kevin Wolf 提交于 6月 16, 2015

This patch removes the temporary duplication between bs->file and
bs->file_child by converting everything to BdrvChild.
Signed-off-by: NKevin Wolf <kwolf@redhat.com>
Reviewed-by: NMax Reitz <mreitz@redhat.com>
Reviewed-by: NAlberto Garcia <berto@igalia.com>
Reviewed-by: NFam Zheng <famz@redhat.com>
Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>

9a4f4c31

12 10月, 2015 1 次提交

block: switch from g_slice allocator to malloc · c84b3192

由 Paolo Bonzini 提交于 10月 01, 2015

Simplify memory allocation by sticking with a single API.  GSlice
is not that fast anyway (tcmalloc/jemalloc are better).
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>

c84b3192

25 9月, 2015 1 次提交

block: Introduce a new API bdrv_co_no_copy_on_readv() · 9568b511

由 Wen Congyang 提交于 9月 08, 2015

In some cases, we need to disable copy-on-read, and just
read the data.
Signed-off-by: NWen Congyang <wency@cn.fujitsu.com>
Message-id: 1441682913-14320-2-git-send-email-wency@cn.fujitsu.com
Signed-off-by: NJeff Cody <jcody@redhat.com>

9568b511

07 7月, 2015 1 次提交

block: update bdrv_drain_all()/bdrv_drain() comments · 7a63f3cd

由 Stefan Hajnoczi 提交于 7月 02, 2015

The doc comments for bdrv_drain_all() and bdrv_drain() are outdated:

 * The bdrv_drain() comment is a poor man's bdrv_lock()/bdrv_unlock()
   which Fam Zheng is currently developing.  Unfortunately this warning
   was never really enough because devices keep submitting I/O and op
   blockers don't prevent that.

 * The bdrv_drain_all() comment is still partially correct but reflects
   the nature of the implementation rather than API documentation.

Do make it clear that bdrv_drain() is only appropriate within an
AioContext.  For anything spanning AioContexts you need
bdrv_drain_all().

Cc: Markus Armbruster <armbru@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: NMarkus Armbruster <armbru@redhat.com>
Reviewed-by: NFam Zheng <famz@redhat.com>
Message-id: 1435854281-6078-1-git-send-email-stefanha@redhat.com

7a63f3cd

02 7月, 2015 3 次提交

block: remove redundant check before g_slist_find() · 764ba3ae

由 Alberto Garcia 提交于 6月 29, 2015

An empty GSList is represented by a NULL pointer, therefore it's a
perfectly valid argument for g_slist_find() and there's no need to
make any additional check.
Signed-off-by: NAlberto Garcia <berto@igalia.com>
Message-id: 1435583533-5758-1-git-send-email-berto@igalia.com
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>

764ba3ae

block: Fix dirty bitmap in bdrv_co_discard · 50824995

由 Fam Zheng 提交于 6月 08, 2015

Unsetting dirty globally with discard is not very correct. The discard may zero
out sectors (depending on can_write_zeroes_with_unmap), we should replicate
this change to destination side to make sure that the guest sees the same data.

Calling bdrv_reset_dirty also troubles mirror job because the hbitmap iterator
doesn't expect unsetting of bits after current position.

So let's do it the opposite way which fixes both problems: set the dirty bits
if we are to discard it.

Reported-by: wangxiaolong@ucloud.cn
Signed-off-by: NFam Zheng <famz@redhat.com>
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Reviewed-by: NEric Blake <eblake@redhat.com>
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>

50824995

block: Add bdrv_get_block_status_above · ba3f0e25

由 Fam Zheng 提交于 6月 08, 2015

Like bdrv_is_allocated_above, this function follows the backing chain until seeing
BDRV_BLOCK_ALLOCATED.  Base is not included.

Reimplement bdrv_is_allocated on top.

[Initialized bdrv_co_get_block_status_above() ret to 0 to silence
mingw64 compiler warning about the unitialized variable.  assert(bs !=
base) prevents that case but I suppose the program could be compiled
with -DNDEBUG.
--Stefan]
Signed-off-by: NFam Zheng <famz@redhat.com>
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>

ba3f0e25

23 6月, 2015 3 次提交

Fix migration in case of scsi-generic · 1b6bc94d

由 Dimitris Aragiorgis 提交于 6月 23, 2015

During migration, QEMU uses fsync()/fdatasync() on the open file
descriptor for read-write block devices to flush data just before
stopping the VM.

However, fsync() on a scsi-generic device returns -EINVAL which
causes the migration to fail. This patch skips flushing data in case
of an SG device, since submitting SCSI commands directly via an SG
character device (e.g. /dev/sg0) bypasses the page cache completely,
anyway.

Note that fsync() not only flushes the page cache but also the disk
cache. The scsi-generic device never sends flushes, and for
migration it assumes that the same SCSI device is used by the
destination host, so it does not issue any SCSI SYNCHRONIZE CACHE
(10) command.

Finally, remove the bdrv_is_sg() test from iscsi_co_flush() since
this is now redundant (we flush the underlying protocol at the end
of bdrv_co_flush() which, with this patch, we never reach).
Signed-off-by: NDimitris Aragiorgis <dimara@arrikto.com>
Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
Message-id: 1435056300-14924-3-git-send-email-dimara@arrikto.com
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>

1b6bc94d

block: Let bdrv_drain_all() to call aio_poll() for each AioContext · f406c03c

由 Alexander Yarygin 提交于 6月 10, 2015

After the commit 9b536adc ("block: acquire AioContext in
bdrv_drain_all()") the aio_poll() function got called for every
BlockDriverState, in assumption that every device may have its own
AioContext. If we have thousands of disks attached, there are a lot of
BlockDriverStates but only a few AioContexts, leading to tons of
unnecessary aio_poll() calls.

This patch changes the bdrv_drain_all() function allowing it find shared
AioContexts and to call aio_poll() only for unique ones.

Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: NAlexander Yarygin <yarygin@linux.vnet.ibm.com>
Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
Tested-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Message-id: 1433936297-7098-4-git-send-email-yarygin@linux.vnet.ibm.com
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>

f406c03c

qerror: Move #include out of qerror.h · d49b6836

由 Markus Armbruster 提交于 3月 17, 2015

Signed-off-by: NMarkus Armbruster <armbru@redhat.com>
Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: NEric Blake <eblake@redhat.com>
Reviewed-by: NLuiz Capitulino <lcapitulino@redhat.com>

d49b6836

12 6月, 2015 2 次提交

throttle: Add throttle group support · 76f4afb4

由 Alberto Garcia 提交于 6月 08, 2015

The throttle group support use a cooperative round robin scheduling
algorithm.

The principles of the algorithm are simple:
- Each BDS of the group is used as a token in a circular way.
- The active BDS computes if a wait must be done and arms the right
  timer.
- If a wait must be done the token timer will be armed so the token
  will become the next active BDS.
Signed-off-by: NAlberto Garcia <berto@igalia.com>
Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
Message-id: f0082a86f3ac01c46170f7eafe2101a92e8fde39.1433779731.git.berto@igalia.com
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>

76f4afb4

throttle: Extract timers from ThrottleState into a separate structure · 0e5b0a2d

由 Benoît Canet 提交于 6月 08, 2015

Group throttling will share ThrottleState between multiple bs.
As a consequence the ThrottleState will be accessed by multiple aio
context.

Timers are tied to their aio context so they must go out of the
ThrottleState structure.

This commit paves the way for each bs of a common ThrottleState to
have its own timer.
Signed-off-by: NBenoit Canet <benoit.canet@nodalink.com>
Signed-off-by: NAlberto Garcia <berto@igalia.com>
Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
Message-id: 6cf9ea96d8b32ae2f8769cead38f68a6a0c8c909.1433779731.git.berto@igalia.com
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>

0e5b0a2d

22 5月, 2015 6 次提交

block: get_block_status: use "else" when testing the opposite condition · a53f1a95

由 Paolo Bonzini 提交于 5月 14, 2015

A bit of Boolean algebra (and common sense) tells us that the
second "if" here is looking for blocks that are not allocated.
This is the opposite of the "if" that sets BDRV_BLOCK_ALLOCATED,
and thus it can use an "else".
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Reviewed-by: NEric Blake <eblake@redhat.com>
Reviewed-by: NFam Zheng <famz@redhat.com>
Message-id: 1431599702-10431-1-git-send-email-pbonzini@redhat.com
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>

a53f1a95

block: Fix NULL deference for unaligned write if qiov is NULL · 9eeb6dd1

由 Fam Zheng 提交于 5月 13, 2015

For zero write, callers pass in NULL qiov (qemu-io "write -z" or
scsi-disk "write same").

Commit fc3959e4 fixed bdrv_co_write_zeroes which is the common case
for this bug, but it still exists in bdrv_aio_write_zeroes. A simpler
fix would be in bdrv_co_do_pwritev which is the NULL dereference point
and covers both cases.

So don't access it in bdrv_co_do_pwritev in this case, use three aligned
writes.

[Initialize ret to 0 in bdrv_co_do_zero_pwritev() to avoid uninitialized
variable warning with gcc 4.9.2.
--Stefan]
Signed-off-by: NFam Zheng <famz@redhat.com>
Message-id: 1431522721-3266-3-git-send-email-famz@redhat.com
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>

9eeb6dd1

Revert "block: Fix unaligned zero write" · d01c07f2

由 Fam Zheng 提交于 5月 13, 2015

This reverts commit fc3959e4.

The core write code already handles the case, so remove this
duplication.

Because commit 61007b31 moved the touched code from block.c to
block/io.c, the change is manually reverted.
Signed-off-by: NFam Zheng <famz@redhat.com>
Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: NKevin Wolf <kwolf@redhat.com>
Message-id: 1431522721-3266-2-git-send-email-famz@redhat.com
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>

d01c07f2

block: align bounce buffers to page · 459b4e66

由 Denis V. Lunev 提交于 5月 12, 2015

The following sequence
    int fd = open(argv[1], O_RDWR | O_CREAT | O_DIRECT, 0644);
    for (i = 0; i < 100000; i++)
            write(fd, buf, 4096);
performs 5% better if buf is aligned to 4096 bytes.

The difference is quite reliable.

On the other hand we do not want at the moment to enforce bounce
buffering if guest request is aligned to 512 bytes.

The patch changes default bounce buffer optimal alignment to
MAX(page size, 4k). 4k is chosen as maximal known sector size on real
HDD.

The justification of the performance improve is quite interesting.
From the kernel point of view each request to the disk was split
by two. This could be seen by blktrace like this:
  9,0   11  1     0.000000000 11151  Q  WS 312737792 + 1023 [qemu-img]
  9,0   11  2     0.000007938 11151  Q  WS 312738815 + 8 [qemu-img]
  9,0   11  3     0.000030735 11151  Q  WS 312738823 + 1016 [qemu-img]
  9,0   11  4     0.000032482 11151  Q  WS 312739839 + 8 [qemu-img]
  9,0   11  5     0.000041379 11151  Q  WS 312739847 + 1016 [qemu-img]
  9,0   11  6     0.000042818 11151  Q  WS 312740863 + 8 [qemu-img]
  9,0   11  7     0.000051236 11151  Q  WS 312740871 + 1017 [qemu-img]
  9,0    5  1     0.169071519 11151  Q  WS 312741888 + 1023 [qemu-img]
After the patch the pattern becomes normal:
  9,0    6  1     0.000000000 12422  Q  WS 314834944 + 1024 [qemu-img]
  9,0    6  2     0.000038527 12422  Q  WS 314835968 + 1024 [qemu-img]
  9,0    6  3     0.000072849 12422  Q  WS 314836992 + 1024 [qemu-img]
  9,0    6  4     0.000106276 12422  Q  WS 314838016 + 1024 [qemu-img]
and the amount of requests sent to disk (could be calculated counting
number of lines in the output of blktrace) is reduced about 2 times.

Both qemu-img and qemu-io are affected while qemu-kvm is not. The guest
does his job well and real requests comes properly aligned (to page).
Signed-off-by: NDenis V. Lunev <den@openvz.org>
Reviewed-by: NKevin Wolf <kwolf@redhat.com>
Message-id: 1431441056-26198-3-git-send-email-den@openvz.org
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>

459b4e66

block: minimal bounce buffer alignment · 4196d2f0

由 Denis V. Lunev 提交于 5月 12, 2015

The patch introduces new concept: minimal memory alignment for bounce
buffers. Original so called "optimal" value is actually minimal required
value for aligment. It should be used for validation that the IOVec
is properly aligned and bounce buffer is not required.

Though, from the performance point of view, it would be better if
bounce buffer or IOVec allocated by QEMU will be aligned stricter.

The patch does not change any alignment value yet.
Signed-off-by: NDenis V. Lunev <den@openvz.org>
Reviewed-by: NKevin Wolf <kwolf@redhat.com>
Message-id: 1431441056-26198-2-git-send-email-den@openvz.org
CC: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: NKevin Wolf <kwolf@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>

4196d2f0

block: return EPERM on writes or discards to read-only devices · eaf5fe2d

由 Paolo Bonzini 提交于 5月 07, 2015

This is the behavior in the operating system, for example Linux's
blkdev_write_iter has the following:

        if (bdev_read_only(I_BDEV(bd_inode)))
                return -EPERM;

This does not apply to opening a device for read/write, when the
device only supports read-only operation.  In this case any of
EACCES, EPERM or EROFS is acceptable depending on why writing is
not possible.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Message-id: 1431013548-22492-1-git-send-email-pbonzini@redhat.com
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>

eaf5fe2d

28 4月, 2015 1 次提交

block: move I/O request processing to block/io.c · 61007b31

由 Stefan Hajnoczi 提交于 4月 28, 2015

The block.c file has grown to over 6000 lines.  It is time to split this
file so there are fewer conflicts and the code is easier to maintain.

Extract I/O request processing code:
 * Read
 * Write
 * Zero writes and making the image empty
 * Flush
 * Discard
 * ioctl
 * Tracked requests and queuing
 * Throttling and copy-on-read
 * Block status and allocated functions
 * Refreshing block limits
 * Reading/writing vmstate
 * qemu_blockalign() and friends

The patch simply moves code from block.c into block/io.c.
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

61007b31