提交 · af91f9a73c3a67eebbf4120cae62b82db8eaae19 · openeuler / qemu

09 2月, 2014 3 次提交

block: bdrv_aligned_pwritev: Assert overlap range · af91f9a7

由 Kevin Wolf 提交于 2月 07, 2014

This adds assertions that the request that we actually end up passing to
the block driver (which includes RMW data and has therefore potentially
been rounded to alignment boundaries) is fully covered by the
overlap_{offset,size} fields of the associated BdrvTrackedRequest.
Suggested-by: NLaszlo Ersek <lersek@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>
Reviewed-by: NEric Blake <eblake@redhat.com>
Reviewed-by: NLaszlo Ersek <lersek@redhat.com>

af91f9a7

block: Fix memory leaks in bdrv_co_do_pwritev() · 99c4a85c

由 Kevin Wolf 提交于 2月 07, 2014

The error path for a failure in one of the two bdrv_aligned_preadv()
calls leaked head_buf or tail_buf, respectively. This fixes the memory
leak.
Reported-by: NLaszlo Ersek <lersek@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>
Reviewed-by: NEric Blake <eblake@redhat.com>
Reviewed-by: NLaszlo Ersek <lersek@redhat.com>

99c4a85c

block: Fail gracefully with missing filename · 765003db

由 Kevin Wolf 提交于 2月 03, 2014

This fixes a regression introduced in commit 2a05cbe4 ('block: Allow
block devices without files'):

$ qemu-system-x86_64 -drive driver=file
qemu-system-x86_64: block.c:892: bdrv_open_common: Assertion
`!drv->bdrv_needs_filename || filename != ((void *)0)' failed.

Now the respective check must be performed not only in bdrv_file_open(),
but also in bdrv_open().
Signed-off-by: NKevin Wolf <kwolf@redhat.com>
Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>

765003db

25 1月, 2014 25 次提交

block: Switch bdrv_io_limits_intercept() to byte granularity · d5103588

由 Kevin Wolf 提交于 1月 16, 2014

Request sizes used to be rounded down to the next sector boundary,
allowing to bypass the I/O limit. Now all requests are accounted for
with their exact byte size.
Reported-by: NWenchao Xia <xiawenc@linux.vnet.ibm.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>
Reviewed-by: NMax Reitz <mreitz@redhat.com>

d5103588

qemu-iotests: Test pwritev RMW logic · 9e1cb96d

由 Kevin Wolf 提交于 1月 14, 2014

Signed-off-by: NKevin Wolf <kwolf@redhat.com>
Reviewed-by: NMax Reitz <mreitz@redhat.com>

9e1cb96d

block: Make bdrv_pwrite() a bdrv_prwv_co() wrapper · 8407d5d7

由 Kevin Wolf 提交于 12月 05, 2013

Instead of implementing the alignment adjustment here, use the now
existing functionality of bdrv_co_do_pwritev().
Signed-off-by: NKevin Wolf <kwolf@redhat.com>
Reviewed-by: NMax Reitz <mreitz@redhat.com>

8407d5d7

block: Make bdrv_pread() a bdrv_prwv_co() wrapper · a3ef6571

由 Kevin Wolf 提交于 12月 05, 2013

Instead of implementing the alignment adjustment here, use the now
existing functionality of bdrv_co_do_preadv().
Signed-off-by: NKevin Wolf <kwolf@redhat.com>
Reviewed-by: NMax Reitz <mreitz@redhat.com>

a3ef6571

K
block: Change coroutine wrapper to byte granularity · 775aa8b6
由 Kevin Wolf 提交于 12月 05, 2013
```
Signed-off-by: NKevin Wolf <kwolf@redhat.com>
Reviewed-by: NMax Reitz <mreitz@redhat.com>
```
775aa8b6

block: Assert serialisation assumptions in pwritev · 28de2dcd

由 Kevin Wolf 提交于 1月 14, 2014

If a request calls wait_serialising_requests() and actually has to wait
in this function (i.e. a coroutine yield), other requests can run and
previously read data (like the head or tail buffer) could become
outdated. In this case, we would have to restart from the beginning to
read in the updated data.

However, we're lucky and don't actually need to do that: A request can
only wait in the first call of wait_serialising_requests() because we
mark it as serialising before that call, so any later requests would
wait. So as we don't wait in practice, we don't have to reload the data.

This is an important assumption that may not be broken or data
corruption will happen. Document it with some assertions.
Signed-off-by: NKevin Wolf <kwolf@redhat.com>
Reviewed-by: NMax Reitz <mreitz@redhat.com>

28de2dcd

block: Align requests in bdrv_co_do_pwritev() · 3b8242e0

由 Kevin Wolf 提交于 12月 03, 2013

This patch changes bdrv_co_do_pwritev() to actually be what its name
promises. If requests aren't properly aligned, it performs a RMW.

Requests touching the same block are serialised against the RMW request.
Further optimisation of this is possible by differentiating types of
requests (concurrent reads should actually be okay here).
Signed-off-by: NKevin Wolf <kwolf@redhat.com>
Reviewed-by: NMax Reitz <mreitz@redhat.com>
Reviewed-by: NBenoit Canet <benoit@irqsave.net>

3b8242e0

block: Allow wait_serialising_requests() at any point · 6460440f

由 Kevin Wolf 提交于 12月 13, 2013

We can only have a single wait_serialising_requests() call per request
because otherwise we can run into deadlocks where requests are waiting
for each other. The same is true when wait_serialising_requests() is not
at the very beginning of a request, so that other requests can be issued
between the start of the tracking and wait_serialising_requests().

Fix this by changing wait_serialising_requests() to ignore requests that
are already (directly or indirectly) waiting for the calling request.
Signed-off-by: NKevin Wolf <kwolf@redhat.com>
Reviewed-by: NMax Reitz <mreitz@redhat.com>
Reviewed-by: NBenoit Canet <benoit@irqsave.net>

6460440f

block: Make overlap range for serialisation dynamic · 7327145f

由 Kevin Wolf 提交于 12月 04, 2013

Copy on Read wants to serialise with all requests touching the same
cluster, so wait_serialising_requests() rounded to cluster boundaries.
Other users like alignment RMW will have different requirements, though
(requests touching the same sector), so make it dynamic.
Signed-off-by: NKevin Wolf <kwolf@redhat.com>
Reviewed-by: NMax Reitz <mreitz@redhat.com>
Reviewed-by: NBenoit Canet <benoit@irqsave.net>

7327145f

block: Generalise and optimise COR serialisation · 2dbafdc0

由 Kevin Wolf 提交于 12月 04, 2013

Change the API so that specific requests can be marked serialising. Only
these requests are checked for overlaps then.

This means that during a Copy on Read operation, not all requests
overlapping other requests are serialised any more, but only those that
actually overlap with the specific COR request.

Also remove COR from function and variable names because this
functionality can be useful in other contexts.
Signed-off-by: NKevin Wolf <kwolf@redhat.com>
Reviewed-by: NMax Reitz <mreitz@redhat.com>
Reviewed-by: NBenoit Canet <benoit@irqsave.net>

2dbafdc0

block: Make zero-after-EOF work with larger alignment · ec746e10

由 Kevin Wolf 提交于 12月 04, 2013

Odd file sizes could make bdrv_aligned_preadv() shorten the request in
non-aligned ways. Fix it by rounding to the required alignment instead
of 512 bytes.
Signed-off-by: NKevin Wolf <kwolf@redhat.com>
Reviewed-by: NMax Reitz <mreitz@redhat.com>
Reviewed-by: NBenoit Canet <benoit@irqsave.net>

ec746e10

block: Allow waiting for overlapping requests between begin/end · 65afd211

由 Kevin Wolf 提交于 12月 03, 2013

Previously, it was not possible to use wait_for_overlapping_requests()
between tracked_request_begin()/end() because it would wait for itself.

Ignore the current request in the overlap check and run more of the
bdrv_co_do_preadv/pwritev code with a BdrvTrackedRequest present.
Signed-off-by: NKevin Wolf <kwolf@redhat.com>
Reviewed-by: NMax Reitz <mreitz@redhat.com>
Reviewed-by: NBenoit Canet <benoit@irqsave.net>

65afd211

block: Switch BdrvTrackedRequest to byte granularity · 793ed47a

由 Kevin Wolf 提交于 12月 03, 2013

Signed-off-by: NKevin Wolf <kwolf@redhat.com>
Reviewed-by: NMax Reitz <mreitz@redhat.com>
Reviewed-by: NBenoit Canet <benoit@irqsave.net>

793ed47a

block: Introduce bdrv_co_do_pwritev() · 6601553e

由 Kevin Wolf 提交于 12月 03, 2013

This is going to become the bdrv_co_do_preadv() equivalent for writes.
In this patch, however, just a function taking byte offsets is created,
it doesn't align anything yet.
Signed-off-by: NKevin Wolf <kwolf@redhat.com>
Reviewed-by: NMax Reitz <mreitz@redhat.com>
Reviewed-by: NBenoit Canet <benoit@irqsave.net>

6601553e

block: write: Handle COR dependency after I/O throttling · 244eadef

由 Kevin Wolf 提交于 12月 03, 2013

First waiting for all COR requests to complete and calling the
throttling function afterwards means that the request could be delayed
and we still need to wait for the COR request even if it was issued only
after the throttled write request.
Signed-off-by: NKevin Wolf <kwolf@redhat.com>
Reviewed-by: NMax Reitz <mreitz@redhat.com>
Reviewed-by: NBenoit Canet <benoit@irqsave.net>

244eadef

block: Introduce bdrv_aligned_pwritev() · b404f720

由 Kevin Wolf 提交于 12月 03, 2013

This separates the part of bdrv_co_do_writev() that needs to happen
before the request is modified to match the backend alignment, and a
part that needs to be executed afterwards and passes the request to the
BlockDriver.
Signed-off-by: NKevin Wolf <kwolf@redhat.com>
Reviewed-by: NMax Reitz <mreitz@redhat.com>
Reviewed-by: NBenoit Canet <benoit@irqsave.net>

b404f720

block: Introduce bdrv_co_do_preadv() · 1b0288ae

由 Kevin Wolf 提交于 12月 02, 2013

Similar to bdrv_pread(), which aligns byte-aligned request to 512 byte
sectors, bdrv_co_do_preadv() takes a byte-aligned request and aligns it
to the alignment specified in bs->request_alignment.
Signed-off-by: NKevin Wolf <kwolf@redhat.com>
Reviewed-by: NMax Reitz <mreitz@redhat.com>
Reviewed-by: NBenoit Canet <benoit@irqsave.net>

1b0288ae

block: Introduce bdrv_aligned_preadv() · d0c7f642

由 Kevin Wolf 提交于 12月 02, 2013

This separates the part of bdrv_co_do_readv() that needs to happen
before the request is modified to match the backend alignment, and a
part that needs to be executed afterwards and passes the request to the
BlockDriver.
Signed-off-by: NKevin Wolf <kwolf@redhat.com>
Reviewed-by: NWenchao Xia <xiawenc@linux.vnet.ibm.com>
Reviewed-by: NMax Reitz <mreitz@redhat.com>

d0c7f642

raw: Probe required direct I/O alignment · c25f53b0

由 Paolo Bonzini 提交于 11月 29, 2011

Add a bs->request_alignment field that contains the required
offset/length alignment for I/O requests and fill it in the raw block
drivers. Use ioctls if possible, else see what alignment it takes for
O_DIRECT to succeed.

While at it, also expose the memory alignment requirements, which may be
(and in practice are) different from the disk alignment requirements.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>
Reviewed-by: NMax Reitz <mreitz@redhat.com>

c25f53b0

block: rename buffer_alignment to guest_block_size · 1b7fd729

由 Paolo Bonzini 提交于 11月 29, 2011

The alignment field is now set to the value that is promised to the
guest, rather than required by the host.  The next patches will make
QEMU aware of the host-provided values, so make this clear.

The alignment is also not about memory buffers, but about the sectors on
the disk, change the documentation of the field.

At this point, the field is set by the device emulation, but completely
ignored by the block layer.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>
Reviewed-by: NWenchao Xia <xiawenc@linux.vnet.ibm.com>
Reviewed-by: NMax Reitz <mreitz@redhat.com>
Reviewed-by: NBenoit Canet <benoit@irqsave.net>

1b7fd729

block: Don't use guest sector size for qemu_blockalign() · 339064d5

由 Kevin Wolf 提交于 11月 28, 2013

bs->buffer_alignment is set by the device emulation and contains the
logical block size of the guest device. This isn't something that the
block layer should know, and even less something to use for determining
the right alignment of buffers to be used for the host.

The new BlockLimits field opt_mem_alignment tells the qemu block layer
the optimal alignment to be used so that no bounce buffer must be used
in the driver.

This patch may change the buffer alignment from 4k to 512 for all
callers that used qemu_blockalign() with the top-level image format
BlockDriverState. The value was never propagated to other levels in the
tree, so in particular raw-posix never required anything else than 512.

While on disks with 4k sectors direct I/O requires a 4k alignment,
memory may still be okay when aligned to 512 byte boundaries. This is
what must have happened in practice, because otherwise this would
already have failed earlier. Therefore I don't expect regressions even
with this intermediate state. Later, raw-posix can implement the hook
and expose a different memory alignment requirement.
Signed-off-by: NKevin Wolf <kwolf@redhat.com>
Reviewed-by: NWenchao Xia <xiawenc@linux.vnet.ibm.com>
Reviewed-by: NMax Reitz <mreitz@redhat.com>

339064d5

block: Detect unaligned length in bdrv_qiov_is_aligned() · 1ff735bd

由 Kevin Wolf 提交于 12月 05, 2013

For an O_DIRECT request to succeed, it's not only necessary that all
base addresses in the qiov are aligned, but also that each length in it
is aligned.
Signed-off-by: NKevin Wolf <kwolf@redhat.com>
Reviewed-by: NWenchao Xia <xiawenc@linux.vnet.ibm.com>
Reviewed-by: NMax Reitz <mreitz@redhat.com>

1ff735bd

block: Update BlockLimits when they might have changed · 355ef4ac

由 Kevin Wolf 提交于 12月 11, 2013

When reopening with different flags, or when backing files disappear
from the chain, the limits may change. Make sure they get updated in
these cases.
Signed-off-by: NKevin Wolf <kwolf@redhat.com>
Reviewed-by: NWenchao Xia <xiawenc@linux.vnet.ibm.com>
Reviewed-by: NMax Reitz <mreitz@redhat.com>
Reviewed-by: NBenoît Canet <benoit@irqsave.net>

355ef4ac

block: Inherit opt_transfer_length · 466ad822

由 Kevin Wolf 提交于 12月 11, 2013

When there is a format driver between the backend, it's not guaranteed
that exposing the opt_transfer_length for the format driver results in
the optimal requests (because of fragmentation etc.), but it can't make
things worse, so let's just do it.
Signed-off-by: NKevin Wolf <kwolf@redhat.com>
Reviewed-by: NWenchao Xia <xiawenc@linux.vnet.ibm.com>
Reviewed-by: NMax Reitz <mreitz@redhat.com>
Reviewed-by: NBenoît Canet <benoit@irqsave.net>

466ad822

block: Move initialisation of BlockLimits to bdrv_refresh_limits() · d34682cd

由 Kevin Wolf 提交于 12月 11, 2013

This function separates filling the BlockLimits from bdrv_open(), which
allows it to call it from other operations which may change the limits
(e.g. modifications to the backing file chain or bdrv_reopen)
Signed-off-by: NKevin Wolf <kwolf@redhat.com>
Reviewed-by: NMax Reitz <mreitz@redhat.com>
Reviewed-by: NBenoit Canet <benoit@irqsave.net>

d34682cd

24 1月, 2014 7 次提交

block: Fix bdrv_commit return value · dabfa6cc

由 Kevin Wolf 提交于 1月 24, 2014

bdrv_commit() could return 0 or 1 on success, depending on whether or
not the last sector was allocated in the overlay and whether the overlay
format had a .bdrv_make_empty callback.

Most callers ignored it, but qemu-img commit would print an error
message while the operation actually succeeded.

Also clean up the handling of I/O errors to return the real error code
instead of -EIO.
Signed-off-by: NKevin Wolf <kwolf@redhat.com>
Reviewed-by: NBenoit Canet <benoit@irqsave.net>

dabfa6cc

block: resize backing file image during offline commit, if necessary · 72706ea4

由 Jeff Cody 提交于 1月 24, 2014

Currently, if an image file is logically larger than its backing file,
committing it via 'qemu-img commit' will fail.

For instance, if we have a base image with a virtual size 10G, and a
snapshot image of size 20G, then committing the snapshot offline with
'qemu-img commit' will likely fail.

This will automatically attempt to resize the base image, if the
snapshot image to be committed is larger.
Signed-off-by: NJeff Cody <jcody@redhat.com>
Reviewed-by: NFam Zheng <famz@redhat.com>
Reviewed-by: NEric Blake <eblake@redhat.com>
Reviewed-by: NBenoit Canet <benoit@irqsave.net>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

72706ea4

B
block: Create authorizations mechanism for external snapshot and resize. · 212a5a8f
由 Benoît Canet 提交于 1月 23, 2014
```
Signed-off-by: NBenoit Canet <benoit@irqsave.net>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>
```
212a5a8f

qmp: Allow to change password on named block driver states. · 12d3ba82

由 Benoît Canet 提交于 1月 23, 2014

Signed-off-by: NBenoit Canet <benoit@irqsave.net>
Reviewed-by: NFam Zheng <famz@redhat.com>

There was two candidate ways to implement named node manipulation:

1)
{ 'command': 'block_passwd', 'data': {'*device': 'str',
                                      '*node-name': 'str', 'password': 'str'}
}

2)

{ 'command': 'block_passwd', 'data': {'device': 'str',
                                      '*device-is-node': 'bool',
                                      'password': 'str'} }

Luiz proposed 1 and says 2 was an abuse of the QMP interface and proposed to
rewrite the QMP block interface for 2.0.

Luiz does not like in 1 the fact that 2 fields are optional but one of them must
be specified leading to an abuse of the QMP semantic.

Kevin argumented that 2 what a clear abuse of the device field and would not be
practical when reading fast some log file because the user would read "device"
and think that a device is manipulated when it's in fact a node name.
Documentation of 1 make it pretty clear what to do for the user.

Kevin argued that all bs are node including devices ones so 2 does not make
sense.

Kevin also argued that rewriting the QMP block interface would not make disapear
the current one.

Kevin pushed the argument that making the QAPI generator compatible with the
semantic of the operation would need a rewrite that no one has done yet.

A vote has been done on the list to elect the version to use and 1 won.

For reference the complete thread is:
"[Qemu-devel] [PATCH V4 4/7] qmp: Allow to change password on names block driver
states."
Signed-off-by: NBenoit Canet <benoit@irqsave.net>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

12d3ba82

qmp: Add QMP query-named-block-nodes to list the named BlockDriverState nodes. · c13163fb

由 Benoît Canet 提交于 1月 23, 2014

Signed-off-by: NBenoit Canet <benoit@irqsave.net>
Reviewed-by: NFam Zheng <famz@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

c13163fb

B
block: Allow the user to define "node-name" option both on command line and QMP. · 6913c0c2
由 Benoît Canet 提交于 1月 23, 2014
```
Signed-off-by: NBenoit Canet <benoit@irqsave.net>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>
```
6913c0c2

block: Add bs->node_name to hold the name of a bs node of the bs graph. · dc364f4c

由 Benoît Canet 提交于 1月 23, 2014

Add the minimum of code to prepare for the following patches.
Signed-off-by: NBenoit Canet <benoit@irqsave.net>
Reviewed-by: NFam Zheng <famz@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

dc364f4c

22 1月, 2014 5 次提交

block: fix backing file segfault · d80ac658

由 Peter Feiner 提交于 1月 08, 2014

When a backing file is opened such that (1) a protocol is directly
used as the block driver and (2) the block driver has bdrv_file_open,
bdrv_open_backing_file segfaults. The problem arises because
bdrv_open_common returns without setting bd->backing_hd->file.

To effect (1), you seem to have to use the -F flag in qemu-img. There
are several block drivers that satisfy (2), such as "file" and "nbd".
Here are some concrete examples:

    #!/bin/bash

    echo Test file format
    ./qemu-img create -f file base.file 1m
    ./qemu-img create -f qcow2 -F file -o backing_file=base.file\
        file-overlay.qcow2
    ./qemu-img convert -O raw file-overlay.qcow2 file-convert.raw

    echo Test nbd format
    SOCK=$PWD/nbd.sock
    ./qemu-img create -f raw base.raw 1m
    ./qemu-nbd -t -k $SOCK base.raw &
    trap "kill $!" EXIT
    while ! test -e $SOCK; do sleep 1; done
    ./qemu-img create -f qcow2 -F nbd -o backing_file=nbd:unix:$SOCK\
        nbd-overlay.qcow2
    ./qemu-img convert -O raw nbd-overlay.qcow2 nbd-convert.raw

Without this patch, the two qemu-img convert commands segfault.

This is a regression that was introduced in v1.7 by
dbecebdd.
Signed-off-by: NPeter Feiner <peter@gridcentric.ca>
Reviewed-by: NMax Reitz <mreitz@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

d80ac658

block: Allow recursive "file"s · 505d7583

由 Max Reitz 提交于 12月 20, 2013

It should be possible to use a format as a driver for a file which in
turn requires another file, i.e., nesting file formats.

Allowing nested file formats results in e.g. qcow2 BlockDriverStates
never being directly passed to bdrv_open_common() from bdrv_file_open(),
but instead being handed through bdrv_open(). This changes the error
message when trying to give a filename to qcow2, i.e. trying to use it
as a driver for the protocol level. Therefore, change the reference
output of I/O test 051 accordingly.
Signed-off-by: NMax Reitz <mreitz@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

505d7583

block: Use bdrv_open_image() in bdrv_open() · 054963f8

由 Max Reitz 提交于 12月 20, 2013

Using bdrv_open_image() instead of bdrv_file_open() directly in
bdrv_open() is easier.
Signed-off-by: NMax Reitz <mreitz@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

054963f8

block: Add bdrv_open_image() · da557aac

由 Max Reitz 提交于 12月 20, 2013

Add a common function for opening images to be used for block drivers
specified through BlockdevRefs in an option QDict. The difference from
bdrv_file_open() is that this function may invoke bdrv_open() instead,
allowing auto-detection of the driver to be used; and second, it
automatically extracts the BlockdevRef from the option QDict.
Signed-off-by: NMax Reitz <mreitz@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

da557aac

block: Allow block devices without files · 2a05cbe4

由 Max Reitz 提交于 12月 20, 2013

blkdebug and blkverify will, in order to retain compatibility, not
support the field "file" implicitly through bdrv_open(). In order to be
able to use those drivers without giving a filename anyway, it is
necessary to be able to have block devices without files implicitly
opened by bdrv_open(). This is the case, if there was neither a file
name, a reference to an existing block device to use as a file nor
options specific to the file.
Signed-off-by: NMax Reitz <mreitz@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

2a05cbe4